GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences
As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on...
Saved in:
Published in: | BMC genomics Vol. 18; no. Suppl 4; p. 392 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
England
BioMed Central Ltd
24-05-2017
BioMed Central BMC |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG
/CpG
varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden.
A novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection.
Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1471-2164 1471-2164 |
DOI: | 10.1186/s12864-017-3731-5 |