Multifactorial Determinants of Protein Expression in Prokaryotic Open Reading Frames
A quantitative description of the relationship between protein expression levels and open reading frame (ORF) nucleotide sequences is important for understanding natural systems, designing synthetic systems, and optimizing heterologous expression. Codon identity, mRNA secondary structure, and nucleo...
Saved in:
Published in: | Journal of molecular biology Vol. 402; no. 5; pp. 905 - 918 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
England
Elsevier Ltd
08-10-2010
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A quantitative description of the relationship between protein expression levels and open reading frame (ORF) nucleotide sequences is important for understanding natural systems, designing synthetic systems, and optimizing heterologous expression. Codon identity, mRNA secondary structure, and nucleotide composition within ORFs markedly influence expression levels. Bioinformatic analysis of ORF sequences in 816 bacterial genomes revealed that these features show distinct regional trends. To investigate their effects on protein expression, we designed 285 synthetic genes and determined corresponding expression levels in vitro using Escherichia coli extracts. We developed a mathematical function, parameterized using this synthetic gene data set, which enables computation of protein expression levels from ORF nucleotide sequences. In addition to its practical application in the design of heterologous expression systems, this equation provides mechanistic insight into the factors that control translation efficiency. We found that expression is strongly dependent on the presence of high AU content and low secondary structure in the ORF 5′ region. Choice of high-frequency codons contributes to a lesser extent. The 3′ terminal AU content makes modest, but detectable contributions. We present a model for the effect of these factors on the three phases of ribosomal function: initiation, elongation, and termination.
[Display omitted]
► Protein expression levels can be calculated from bacterial ORF mRNA sequences. ► Our experimentally calibrated equation provides insights into translation efficiency. ► The sequence in the first 45–60 bases in the ORF is critical for expression. ► The sequence in the last 45–60 bases also contributes, but to a lesser extent. ► Nucleotide composition dominates, followed by RNA structure and high-frequency codons. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0022-2836 1089-8638 |
DOI: | 10.1016/j.jmb.2010.08.010 |