Joint learning improves protein abundance prediction in cancers

The classic central dogma in biology is the information flow from DNA to mRNA to protein, yet complicated regulatory mechanisms underlying protein translation often lead to weak correlations between mRNA and protein abundances. This is particularly the case in cancer samples and when evaluating the...

Full description

Saved in:

Bibliographic Details
Published in:	BMC biology Vol. 17; no. 1; p. 107
Main Authors:	Li, Hongyang, Siddiqui, Omer, Zhang, Hongjiu, Guan, Yuanfang
Format:	Journal Article
Language:	English
Published:	England BioMed Central Ltd 23-12-2019 BioMed Central BMC
Subjects:	Analysis Breast Breast cancer Breast Neoplasms - genetics Cancer Deoxyribonucleic acid DNA Female Gene expression Genes Genetic research Humans Information flow Information management Information processing Learning Machine Learning Messenger RNA Metabolism Methodology Ovarian cancer Ovarian Neoplasms - genetics Performance prediction Protein expression Proteins Proteome - genetics Proteomes Proteomics Proteomics - methods Regulatory mechanisms (biology) RNA Survival analysis Tissues Transcriptome - genetics Transcriptomics Transcriptomics Machine learning Proteomics Cancer
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The classic central dogma in biology is the information flow from DNA to mRNA to protein, yet complicated regulatory mechanisms underlying protein translation often lead to weak correlations between mRNA and protein abundances. This is particularly the case in cancer samples and when evaluating the same gene across multiple samples. Here, we report a method for predicting proteome from transcriptome, using a training dataset provided by NCI-CPTAC and TCGA, consisting of transcriptome and proteome data from 77 breast and 105 ovarian cancer samples. First, we establish a generic model capturing the correlation between mRNA and protein abundance of a single gene. Second, we build a gene-specific model capturing the interdependencies among multiple genes in a regulatory network. Third, we create a cross-tissue model by joint learning the information of shared regulatory networks and pathways across cancer tissues. Our method ranked first in the NCI-CPTAC DREAM Proteogenomics Challenge, and the predictive performance is close to the accuracy of experimental replicates. Key functional pathways and network modules controlling the proteomic abundance in cancers were revealed, in particular metabolism-related genes. We present a method to predict proteome from transcriptome, leveraging data from different cancer tissues to build a trans-tissue model, and suggest how to integrate information from multiple cancers to provide a foundation for further research.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1741-7007 1741-7007
DOI:	10.1186/s12915-019-0730-9