Codons
As we have learned, DNA and RNA are made up of sequences of nitrogen base-pairs: adenine, thymine (uracil in RNA), guanine, and cytosine. Scientists have long understood that these nitrogen bases contain the information that codes for specific amino acids. However, it took some time before they figured out how the base pairs accomplished this coding.
Scientists’ main problem lay in the fact that while there were only 4 nucleotides, there were 20 amino acids for which those nucleotides had to code. If adenine, thymine, guanine, and cytosine each coded for a particular amino acid, then the DNA/mRNA information system would only be able to code for 4 amino acids. If, however, groups of two nucleotides coded for a single amino acid, the story is somewhat different. Given four nucleotides looked at in groups of two, there are sixteen possible combinations (AA, AU, AG, AC, UA, UU, UG, UC, GA, GU, GG, GC, CA, CU, CG, CC); that sixteen is still not enough to code for twenty amino acids. But if the nucleotides code for amino acids in groups of three then there are sixty-four possible combinations. Scientific experiments have verified that nucleotides code for amino acids in successive groups of threes. These groups of three are called codons.
Degeneracy of the Genetic Code
As we know, since the genetic code is read in triplets and there are four possible bases that can occupy each position, the number of possible codons is 4 X 4 X 4, or 64 codons. However, there are only 20 amino acids found in most organisms. Experiments have shown that three codons function as stop codons, acting as termination signals in translation. Yet that brings the count up to only twenty-three necessary codons. The vast difference between possible codon variations and needed codon variations means, as seen in the figure below, that each amino acid is specified by more than one codon. Because the genetic code therefore does not code to its capacity, it is called "degenerate."
Figure 6.09: The Entire Genetic Code Dictionary
Rules of the Genetic Code
In addition to the principles we have already discussed, there are basic rules that govern the translation of the genetic code into proteins. There are three principle rules we will discuss:
-
The sequence of bases in a codon must follow the direction of translation.
-
The code is non-overlapping.
-
The code is read in a fixed reading frame.
Rule 1
The first rule is somewhat basic. It says that since mRNA is translated in the 5' to 3' direction, the codon sequences must occur in a similar orientation so that they will be properly translated. This simply means that the first base of a codon must be located at the 5'-most end of the codon. Codons must always be read from 5' to 3'.
Rule 2
The second rule means that any one nucleotide can only be a part of one codon. It cannot be a part of two different codons. Therefore, successive codons are composed of adjacent, not overlapping, trinucleotides. For example, given the code AACU, AAC could be a codon with U starting a new codon or ACU could be a codon with the first A the last letter of a previous codon. But AAC and ACU cannot both be codons at the same time.
Rule 3
The final rule states that once you begin reading the code from a specific nucleotide, you continue reading it by threes until the end. The beginning of an amino acid sequence is specified by a start codon located somewhere in the mRNA sequence, this is usually an AUG, but can also be a GUG. The end of a sequence is specified by one of three stop codons: UAA, UAG, or UGA. A consequence of this rule is that the genetic code can be read in three different reading frames depending on which base one begins with. For example, the sequence: ACGACGACGACGACG can be read in the three following ways.
1.) ACG ACG ACG ACG ACG (each codon specifies the amino acid threonine)
2.) A CGA CGA CGA CGA CG (each codon specifies the amino acid arginine)
3.) AC GAC GAC GAC GAC G (each codon specifies the amino acid asparagine)