Genetics Problem Spaces

Introduction

Problem Spaces

Codon Usage

Overview

Intro Exercise
Research Questions
Literature
Resources

Assignments

Student Projects

Codon Usage

Introductory Exercise

Introduction

The genetic code is described as degenerate because there is more than one codon for most amino acids. For example, there are four codons corresponding to the amino acid valine, GUU, GUC, GUA, GUG. All four of the valine codons effectively code for valine in the polypeptide chain. Therefore, we might expect each of the valine codons to be used in about equal proportions. However, this is not the case for many species. For example, analysis of genes in E. coli shows that some valine codons are used more frequently than others are. The GUU codon is used in 25%, GUC is used in 21%, GUA is used in 15%, and GUG is used in 38% of the time. This phenomenon of unequal use of codons with identical functions is referred to as codon bias.

The biological significance of codon bias is unclear. The uneven use of codons is too extreme to be accounted for by chance deviation. Additionally, different species show different codon biases. For example E. coli prefers the GUG codon for valine (38%) while H. sapiens uses this codon only 10% of the time and instead prefer the GUC codon for valine ( 40%).

Many different explanations have been proposed to explain codon bias. Some researchers have hypothesized that codon bias is a genetic adaptation to the slight difference in translational machinery found in different species. For example, some species might not contain equal amounts of all the cognate tRNA's for a particular amino acid. Consider the codons for valine. As mentioned before there are several different codons for the amino acid valine. Therefore the cell could have several different tRNA for valine with differing anticodons. It is possible that these different leucine tRNA are not all equally abundant in the cell. For example, it is possible that in E. coli there are more valine-tRNA's that bind to GUG codons than there are valine-tRNA's that bind to GUA codons. Therefore, there may be selection for alleles that use the GUG codon for valine versus the GUA codon. After thousands of generations of selection this will result in codon bias in the genome.

Others have proposed that codon bias is not a response to selective pressure caused by biases in tRNA populations. Alternative factors proposed to lead to codon bias include sequences related to secondary structure in mRNA, sequences that promote stability of mRNA and sequences that facilitate subcellular localization of the mRNA and the protein products.

Laboratory Exercise

For this laboratory exercise you will investigate a genes from a single phylum to determine if that phylum demonstrates codon bias. Specifically, you will identify 5 different protein-encoding genes for your phylum. You will then translate the ORF of the genes and identify all the valine encoding codons. You will then prepare a summary table reporting the frequency at which each of the codons is used. You will use Chi-Square to test whether your phylum shows random codon usage.

You will be assigned a phylum to investigate at the beginning of the lab period.

DNA Databases

To conduct this analysis you will need to find DNA sequences of different genes within your assigned phylum. The world's largest collection of gene sequences is available in the DNA databases of the National Center for Biotechnology Information. The easiest way to access this database is on the internet interface known as "Entrez". Entrez operates like internet search engines such as "Yahoo" except it searches DNA databases.

Instructions for identifying genes on Entrez

1. Access the Entrez site on your internet browser using the following URL.

http://www.ncbi.nlm.nih.gov/Entrez/

2. Click on "Nucleotide" on the black bar to search for gene sequences.

3. Enter your search parameters in to the search window. So that you can identify complete gene sequences from your phylum, I suggest that you use the following terms for your search parameters.

"your phylum" complete cds

4. To examine the first database entry click on the first blue accession number.

5. Review the components of the database entry. Particularly look at the "organism" line to confirm that this gene comes from your phylum. Because of the way the database is organized, it is possible your search will identify genes from other phylum. If the gene is from another phylum return to your search results to examine the next database entry. Additionally, it is important to not use any genes from the mitochondrial genome. The mitochondrion has a different codon bias than the nucleus.

6. To identify the sequence corresponding to the ORF, find the CDS link under "Features". Click this link to get a database entry for the portion of the gene corresponding to the ORF.

7. Next to the display button there is a drop down window. Select "Fasta" on the drop down window and click the display button. This will generate and ORF sequence with no notations or numbers. You can use the mouse to select and copy the DNA sequence in this FASTA format.

Translation of ORF

NCBI offers a number of on-line programs for analyzing DNA sequences. One of the programs, ORF Finder, identifies the open reading frames in a gene sequences and displays the predicted amino acid sequence above the nucleotide sequence.

Instructions for translating ORF on ORF Finder.

1. Go the NCBI home page. Click on the NCBI link in the upper corner of the entrez page or go to the following ORF.

http://www.ncbi.nlm.nih.gov/

2. Click on the "ORF Finder" link on the right hand side of the page listed under "Hot Spots".

3. Paste your FASTA format sequence of the CDS you copied from entrez in the FASTA window. Click the ORFFind button.

4. A diagrammatic representation of the six possible reading frames will be displayed. Each bar represents one of the possible frames. Green regions identify the location of the ORF,s with start and stop codons. Typically, the top bar will have the longest green region. Click on the bar to obtain the sequence of this frame.

5. Scroll down and exam the DNA and Polypeptide sequence. The Start and Stop Codons are indicated in Green and Purple respectively.

6. Copy the DNA/Polypeptide Sequencing into a word document. Make sure the document is in a Courier Font and adjust the margins to align the DNA sequences.

Analysis of Codon Usage

There are four codons for the amino acid valine, GTT, GTC, GTA, GTG. On the translated sequences, identify all of the valines in the predicted protein. Use a highlighter to mark all of the valine codons in the gene. Record the number of times each valine codon is used.

Chi Square Analysis of Codon Bias.

Use Chi square to test the hypothesis that your phylum displays no codon bias. Your null hypothesis is that the four valine codons are used in equal frequencies in your phylum.

Laboratory Assignment

1. Use Entrez to identify five different genes from your phylum. Your genes must come from at least 3 different species within your phylum. You should use genes with CDS's that are between 300 and 3000 nucleotides long.

2. Identify and translate the ORF of the gene using the ORF Finder program.

3. Print and save a word version of the translated sequence. See the attached example (appendix below table).

4. Identify all the valine codons and record the number of times each codon is used in each gene. Report codon usage in a table format. See attached format

5. Use the summary data from the five genes to investigate whether your phylum demonstrates codon bias.

6. Write a brief lab report describing your findings. Follow the format described in Pechenick.

Your report should include an Introduction, Materials and Methods, Results and Discussion sections. For your results you should include two tables to complement your results narrative, the summary table of valine codon usage and your chi square analysis. Most reports will be approximately 6 typed pages.

7. Attach your five translated sequences to the back of your report as an appendix of the raw data. You do not need to discuss this raw data in detail in your results or discussion section.

8. Report will be due at the beginning of class on March 18, 2005. Reports turned in after March 18 will be penalized 10% for each day they are late.