Kozak consensus sequence

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

The Kozak consensus sequence, Kozak consensus or Kozak sequence, is a sequence which occurs on eukaryotic mRNA and has the consensus (gcc)gccRccAUGG. The Kozak consensus sequence plays a major role in the initiation of the translation process.[1] The sequence was named after the person who brought it to prominence, Marilyn Kozak.

The sequence is identified by the notation (gcc)gccRccAUGG, which summarizes data analysed by Kozak from a wide variety of sources (about 699 in all)[2] as follows:

  1. a lower case letter denotes the most common base at a position where the base can nevertheless vary;
  2. upper case letters indicate highly conserved bases, i.e. the 'AUGG' sequence is constant or rarely, if ever, changes, with the exception being the IUPAC ambiguity code [3] 'R' which indicates that a purine (adenine or guanine) is always observed at this position (with adenine being claimed by Kozak to be more frequent); and
  3. the sequence in brackets ((gcc)) is of uncertain significance.

Kozak's paper was limited to a subset of vertebrates (i.e. human, cow, cat, dog, chicken, guinea pig, hamster, mouse, pig, rabbit, sheep, and xenopus).

Introduction

This sequence on an mRNA molecule is recognized by the ribosome as the translational start site, from which a protein is coded by that mRNA molecule. The ribosome requires this sequence, or a possible variation (see below) to initiate translation. The Kozak sequence is not to be confused with the ribosomal binding site (RBS), that being either the 5' cap of a messenger RNA or an Internal ribosome entry site (IRES).

In vivo, this site is often not matched exactly on different mRNAs and the amount of protein synthesized from a given mRNA is dependent on the strength of the Kozak sequence.[4] Some nucleotides in this sequence are more important than others: the AUG is most important because it is the actual initiation codon encoding a methionine amino acid at the N-terminus of the protein. (Rarely, CUG is used as an initiation codon, encoding a leucine instead of the typical methionine.) The A nucleotide of the "AUG" is referred to as number 1. For a 'strong' consensus, the nucleotides at positions +4 (i.e. G in the consensus) and -3 (i.e. either A or G in the consensus) relative to the number 1 nucleotide must both match the consensus (there is no number 0 position). An 'adequate' consensus has only 1 of these sites, while a 'weak' consensus has neither. The cc at -1 and -2 are not as conserved, but contribute to the overall strength.[5] There is also evidence that a G in the -6 position is important in the initiation of translation.[1]

There are examples in vivo of each of these types of Kozak consensus, and they probably evolved as yet another mechanism of gene regulation. Lmx1b is an example of a gene with a weak Kozak consensus sequence.[6] For initiation of translation from such a site, other features are required in the mRNA sequence in order for the ribosome to recognize the initiation codon.

A sequence logo showing the most conserved bases around the initiation codon from 10 000 human mRNAs.

Mutations

Research has shown that a mutation of G—>C in the -6 position of the β-globin gene (β+45; human) disrupted the haematological and biosynthetic phenotype function. This was the first mutation found in the Kozak sequence. It was found in a family from the Southeast Italy and they suffered from thalassaemia intermedia.[1]

Variations in the consensus sequence

(gcc)gccRccAUGG
       AGNNAUGN
        ANNAUGG
        ACCAUGG
     GACACCAUGG
Kozak-like sequences in various eukaryotes
Biota Phylum Consensus sequences
Vertebrate
gccRccATGG[2]
Fruit fly (Drosophila spp.) Arthropoda   cAAacATG[7]
Budding yeast (Saccharomyces cerevisiae) Ascomycota aAaAaAATGTCt[8]
Slime mold (Dictyostelium discoideum) Amoebozoa aaaAAAATGRna[9]
Ciliate Ciliophora nTaAAAATGRct[9]
Malarial protozoa (Plasmodium spp.) Apicomplexa taaAAAATGAan[9]
Toxoplasma (Toxoplasma gondii) Apicomplexa gncAaaATGg[10]
Trypanosomatidae Euglenozoa nnnAnnATGnC[9]
Terrestrial plants
  AACAATGGC[11]

See also

References

  1. 1.0 1.1 1.2 Lua error in package.lua at line 80: module 'strict' not found.
  2. 2.0 2.1 Lua error in package.lua at line 80: module 'strict' not found.
  3. Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences, NC-IUB, 1984.
  4. Lua error in package.lua at line 80: module 'strict' not found.
  5. Lua error in package.lua at line 80: module 'strict' not found.
  6. Lua error in package.lua at line 80: module 'strict' not found.
  7. Lua error in package.lua at line 80: module 'strict' not found.
  8. Lua error in package.lua at line 80: module 'strict' not found.
  9. 9.0 9.1 9.2 9.3 Lua error in package.lua at line 80: module 'strict' not found.
  10. Lua error in package.lua at line 80: module 'strict' not found.
  11. Lua error in package.lua at line 80: module 'strict' not found.

Further reading

Lua error in package.lua at line 80: module 'strict' not found.

  • Lua error in package.lua at line 80: module 'strict' not found.
  • Lua error in package.lua at line 80: module 'strict' not found.
  • Lua error in package.lua at line 80: module 'strict' not found.