Deep Learning for Transcriptomics and Proteomics
Improvements in sequencing technologies increased the availability of omics data, such as transcriptomics and proteomics, providing information about various molecular mechanisms from complementary angles. These measurements can be key to gaining a better understanding of phenotype-genotype associat...
Saved in:
Main Author: | |
---|---|
Format: | Dissertation |
Language: | English |
Published: |
ProQuest Dissertations & Theses
01-01-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Improvements in sequencing technologies increased the availability of omics data, such as transcriptomics and proteomics, providing information about various molecular mechanisms from complementary angles. These measurements can be key to gaining a better understanding of phenotype-genotype associations. Machine learning has great potential to capture the relevant signals from these datasets; however, the inherently complex nature of the measurements, where the signals of biological interest are entangled with technical and other biological factors, makes it difficult to apply these methods directly.Our goal in this thesis is to address the fundamental challenges associated with transcriptomics and proteomics data hindering the application of machine learning models. Specifically, we tackle (1) high dimensionality, i.e., higher number of features than samples, (2) batch effects and confounders, i.e., signals introduced by technical or biological artefacts, and (3) experimental noise and bias, i.e., inaccuracies in measurements. To solve these problems, we develop three novel deep learning approaches: DeepProfile, AD-AE, and Pepper.DeepProfile is an ensemble of unsupervised neural network models trained to learn lower dimensional embeddings, effectively reducing the dimensionality and complexity of gene expression profiles. By integrating expression profiles from different sources and adopting an interpretable framework, we generate embeddings to investigate cancer mechanisms. AD-AE disentangles the confounding sources of biological or technical variance and the biological signals of interest. Our model consists of an unsupervised neural network to learn lower dimensional embeddings and an adversarial predictor to eliminate confounders. The resulting deconfounded representations improve accuracy of downstream prediction models and can be successfully transferred across domains. Pepper focuses on proteomics measurements and aims to reduce the effects of sequence-induced bias for the accurate quantification of proteins. We incorporate our biological hypothesis into the loss functions of our neural network approach to predict and correct for sequence-induced bias. This results in reduction in quantification bias as well as an increase in the correlation between gene and protein expression.We demonstrate that each of these deep learning models can generate more informative and interpretable versions of our datasets. The resulting representations or the denoised measurements facilitate the application of machine learning techniques for the investigation of phenotypic variation and cellular mechanisms, which we hope will lead to a better understanding of underlying biology. |
---|---|
ISBN: | 9798837529290 |