Exploring Gene Expression and Protein Binding Data for Gene Regulation

Gene expression is a tightly controlled process that is regulated by the epigenetic modifications and a series of interactions between the genes and the proteins across the genome. High-throughput technologies such as microarray and chromatin immunoprecipitation technique followed by the next genera...

Full description

Saved in:
Bibliographic Details
Main Author: Ferdous, Mohsina Mahmuda
Format: Dissertation
Language:English
Published: ProQuest Dissertations & Theses 01-01-2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Gene expression is a tightly controlled process that is regulated by the epigenetic modifications and a series of interactions between the genes and the proteins across the genome. High-throughput technologies such as microarray and chromatin immunoprecipitation technique followed by the next generation sequencing (ChIP-seq) have enabled researchers to investigate the gene expression profile of large of number of genes and the locations of protein bindings and different epigenetic events at the genome-wide scale. To understand the underlying complex mechanisms that regulate gene expression, the computational biology community has proposed many methodologies and tools over the years to integrate the protein binding data; obtained by ChIP-seq and the gene expression data; generated by microarray technology. However, the integrative analysis is still in its infancy. Effective models that capture the complex characteristics of ChIP-seq data and integrate dynamic interactions between gene expression and regulatory factors across different genomic features are still lacking. This thesis aims to provide robust and reliable methodologies to enable investigation of the relationship between different regulatory mechanisms and gene expression that incorporate the advanced and improved results from the ChIP-seq data and the epigenetic phenomena that are closely related to gene regulation. Here, the Markov Random Field model has been adapted to analyse the binding regions of proteins and epigenetic markers using ChIP-Seq technology where the complex characteristics of the data such as spatial dependency, IP efficiency are taken into consideration while modelling the data and demonstrated how this model along with the pre-analysis steps can improve the binding results. Two models have been proposed where these results are then assimilated in the integrative analyses between ChIP-seq and the gene expression data. Several classification techniques are also included in one of the models to find the association between different epigenetic markers, proteins, genomic features and gene expression profile. The models have been applied to public datasets and the results have been validated. With the proposed models, it has been shown how the dynamic interactions between the regulatory proteins and gene expression can be investigated by integrating sets of genes regulated at successive time-points and different biological or experimental conditions as well as protein binding profiles across the genome. If either the gene expression or the protein binding data is missing as it is often the case, studying the relationship between regulatory factors and gene expression with these models will help the biologists estimate gene expression from the available epigenetics data or assume the underlying epigenetics from the available gene expression data. In short, this thesis brings together different biological tools, data processing techniques, advanced machine learning techniques to make a systematic approach to advancing the state of the art in this important epigenetic field.