A clustering model for identification of time course gene expression patterns

Identification of gene expression patterns when studying complex and dynamic biological processes such as gene regulatory functions is critical. Gene expression is a continuous biological phenomenon and can be represented by a continuous function (curve). Each gene behaving in such a continuous func...

Full description

Saved in:
Bibliographic Details
Published in:2016 1st International Conference on Biomedical Engineering (IBIOMED) pp. 1 - 6
Main Authors: Ochieng, Peter Juma, Tarigan, Sri Ita, Didik, Hendrik
Format: Conference Proceeding
Language:English
Published: IEEE 01-10-2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Identification of gene expression patterns when studying complex and dynamic biological processes such as gene regulatory functions is critical. Gene expression is a continuous biological phenomenon and can be represented by a continuous function (curve). Each gene behaving in such a continuous functions often shares similar functional forms. However, patterns such as numbers, shape, and the identities of those genes sharing similar functional forms remain unknown. To identify such functional forms we introduce a clustering model for identification of time course gene expression patterns. The method utilizes an S-spline approach to model the functional curves and a penalized log-likelihood approach to fit the model. In addition, a rejection-controlled EM algorithm is designed minimizes the error and computational cost during mean curve estimation. Furthermore, the method utilizes general crossvalidation to select smoothing parameters and further measure the clustering uncertainty using the Bayesian information criterion. The interest of the method is illustrated by its application to D. melanogaster life cycle datasets. Simulation results indicated our method accurately estimates mean expression curve to true functional forms by assigning the gene to cluster, predicting mean curve and providing 95% associated confidence bands for each cluster. Based on Gene Ontology term description, the estimated mean curve in each cluster reflects true gene functional annotations with biologically meaningful gene expression patterns. Finally, comparative clustering performance indicates our method to outperform Fuzzy-cMeans and K-Means by misclassification rate of 0.1289 and overall success rate of 98.71%.
DOI:10.1109/IBIOMED.2016.7869819