Prevalence of transcription promoters within archaeal operons and coding sequences
Despite the knowledge of complex prokaryotic‐transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well‐defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we...
Saved in:
Published in: | Molecular systems biology Vol. 5; no. 1; pp. 285 - n/a |
---|---|
Main Authors: | , , , , , , , , , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
London
Nature Publishing Group UK
16-06-2009
John Wiley & Sons, Ltd EMBO Press Wiley Nature Publishing Group Springer Nature |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Despite the knowledge of complex prokaryotic‐transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well‐defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome‐wide characterization of transcript structures of ∼64% of all genes, including putative non‐coding RNAs in
Halobacterium salinarum NRC‐1
. Our integrative analysis of transcriptome dynamics and protein–DNA interaction data sets showed widespread environment‐dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3′ ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes—events usually considered spurious or non‐functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements.
Synopsis
Evidence is mounting that the standard model of transcription factor (TF) binding to intergenic regions is not always the rule. Although there is isolated prior evidence for functional consequences of TF binding inside coding sequences, this issue had not been systematically evaluated genome wide. We have conducted a study to investigate the genome‐wide consequence of internal TF binding for nearly 10% of all TFs in an archaeal extremophile,
Halobacterium salinarum NRC‐1
. We show that a significant number of TF‐binding sites (TFBS) inside the coding sequences are functional and have marked consequences, such as by conditionally modulating the architecture of at least 43% of all operons in this organism. We present the integrated analysis of complementary systems‐wide data on TFBS locations and dynamic modulation of transcriptome structure that led to this striking discovery.
Using ChIP–chip and the
MeDiChI
algorithm (Reiss
et al
,
2008
), we precisely located TFBSs and determined their corresponding local false discovery rates (
LFDRs
) from new and previously reported genome‐wide ChIP–chip measurements for 11 TFs: all TFBs (TFBa, TFBb, TFBc, TFBd, TFBe, TFBf and TFBg), one TBP (TBPb) and three transcriptional regulators (TRs) (Trh3, Trh4, VNG1451C) in
H. salinarum NRC‐1
. Our conclusion from this analysis was that as many as 10% of all multi‐TFBS loci were within coding regions.
To show that these TFBS have significant functional consequences on transcriptional regulation and cellular physiology, we used high‐density genome tiling arrays to analyze the transcriptome structure (TS) of
H. salinarum NRC‐1
at different phases of growth in a batch culture, which is associated with differential regulation of over 65% of all genes. Through this analysis we assigned transcription start sites (TSSs) to 64% of all annotated genes, termination sites (TTSs) to 46% of the genes, verified the expression of 203 operons and discovered 5′and 3′ UTRs for ∼65% of all genes and operons. Further, by correlating the transcribed units with chromosomal coordinates of predicted genes (Ng
et al
,
2000
) and experimentally mapped peptides from large‐scale proteomics studies (Van
et al
,
2008
), we revised the translation start site for 61 genes, detected 10 new protein‐coding genes, and discovered 61 new putative ncRNAs. Although the physiological roles and mechanisms of action of specific ncRNAs remain to be uncovered, the bimodal distribution of correlations between the expression of ncRNAs and that of their antisense strands are consistent with the characterized roles of ncRNAs in the regulation of their cognate antisense transcripts. Finally, this analysis also showed a large mRNA population that has variable 3′‐end locations and transcripts with extensive overlaps in their 3′ termini.
By integrating TFBS locations with the TS, we identified internal binding sites that are functional in the conditional modulation of operon organization. We assessed the global prevalence of such operons by devising a quantitative measure for classifying operons as conditional. Specifically, we found that 43% of all operons are conditionally modulated by integrating probe intensities of transcripts hybridized to the genome tiling array with gene‐expression correlations derived from expression analysis of
H. salinarum NRC‐1
in 719 microarray experiments. Remarkably, there was a strong functional link between transcription‐factor binding inside operons and their classification as ‘conditional’ (
P
<10
−9
). We transcriptionally fused two of these conditionally activated promoters inside coding sequences to a reporter gene encoding a fast‐degrading GFP variant optimized for the high‐salt cytoplasm of halophilic archaea. FACS analysis of cells harboring these internal promoter–reporter transcriptional fusions provided
in vivo
validation of growth‐phase regulated transcription initiation inside coding sequences.
Although earlier studies have discovered internal promoters within a single gene or operon (Tsui
et al
,
1994
; Guillot and Moran,
2007
), we have significantly extended these findings to a genome‐wide scale to show that biologically meaningful promoters do exist inside coding sequences at a frequency that is much higher than was previously appreciated. Further, this discovery also shows how a simple prokaryote can use the same set of genes in different combinations to elicit complex responses according to an environmental challenge.
Irrespective of the specific underlying mechanisms, our observations of widespread modulation of operon architecture, as well as transcription initiation and termination inside genes, etc. all constitute evidence that archaea can intersperse regulatory logic within their coding sequence and thus blur the boundaries between coding and non‐coding elements. We have shown that it is possible to use new high‐throughput technologies to find these biologically important instances where transcriptional regulation does occur within coding sequences and, furthermore, that it is possible to globally characterize specific regulatory mechanisms responsible for these phenomena. Combined with new high‐throughput sequencing technologies, our results will expand the view of genetic‐information processing that can be investigated at high resolution (Nagalakshmi
et al
,
2008
; Wilhelm
et al
,
2008
). These data will enable construction of mechanistically accurate models for reliable systems re‐engineering of biological circuits. Moreover, these findings suggest that the incorporation of mechanistic accuracy into GRN models would require operons, promoters, and terminators to be treated as dynamic entities.
A systematic evaluation of transcription factor binding site loci (TFBS) for nearly 10% of all TFs in
Halobacterium salinarum NRC‐1
via ChIP‐chip demonstrated that a significant fraction of TFBS loci (as many as ~10% of multi‐TFBS loci for 11 TFs) fell within coding regions.
By correlating the dynamic changes in the transcriptome structure (TS) of
H. salinarum NRC‐1
during a complex cellular response with genome‐wide binding locations of TFs and peptides from proteomics experiments, we have (i) characterized transcription start sites and termination sites for ~64% of all genes in this organism; and discovered (ii) new protein coding genes, (iii) 61 novel ncRNA candidates, (iv) 5' and 3' untranslated regions (UTRs) of mRNAs, (v) a large mRNA population with variable 3' end locations, and (vi) transcripts with extensive overlaps in their 3' termini.
By integrating TFBS locations with the TS, we demonstrate that a significant number of TF binding events inside coding regions are indeed functional with important consequences such as in mediating conditional modulation of at least 43% of all investigated operons (p <10
‐9
).
These findings suggest that the construction of a mechanistically accurate model of a gene regulatory network would have to consider operons, promoters, and terminators as dynamically changing elements. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 FG02-07ER64327 USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division These authors contributed equally to this work Present address: Departamento de Bioquímica e Imunologia, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Brazil. E-mail: tiekoide@gmail.com |
ISSN: | 1744-4292 1744-4292 |
DOI: | 10.1038/msb.2009.42 |