Identifying DNase I hypersensitive sites using multi-features fusion and F-score features selection via Chou's 5-steps rule

DNase I hypersensitive sites (DHSs) are regarded as those regions of chromatin that are sensitive to cleavage by the DNase I enzyme. Identification of DNase I hypersensitive sites will provide useful insights for discovering DNA's functional elements from the non-coding sequences in the biomedi...

Full description

Saved in:
Bibliographic Details
Published in:Biophysical chemistry Vol. 253; p. 106227
Main Authors: Liang, Yunyun, Zhang, Shengli
Format: Journal Article
Language:English
Published: Netherlands Elsevier B.V 01-10-2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:DNase I hypersensitive sites (DHSs) are regarded as those regions of chromatin that are sensitive to cleavage by the DNase I enzyme. Identification of DNase I hypersensitive sites will provide useful insights for discovering DNA's functional elements from the non-coding sequences in the biomedical research. Because of the significance for DNase I hypersensitive sites, it is indispensable to develop an accurate, fast, robust, and high-throughput automated computational model. In this paper, we develop a model named iDHSs-MFF by combining multiple fusion features and F-score features selection approach. The multiple fusion features include three auto-correlation descriptors based on the dinucleotide property matrix and the trinucleotide property matrix (TPM), Pseudo-DPM and Pseudo-TPM. Evaluation by the jackknife cross-validation indicates that the selected features by F-score are effective in the identification of DNase I hypersensitive sites. Experimental results on two benchmark datasets demonstrate that the proposed model outperforms some highly related models. Systematic application of this computational approach will greatly facilitate the analysis of transcriptional regulatory elements. The datasets and Matlab source codes are freely available at: https://github.com/shengli0201/Datasets. [Display omitted] •A novel identifying model named iDHSs-MFF is proposed based on dinucleotide and trinucleotide property matrixs.•F-score approach is performed for features selection.•iDHSs-MFF model outperforms some highly related models.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0301-4622
1873-4200
DOI:10.1016/j.bpc.2019.106227