Identifying DNase I hypersensitive sites using multi-features fusion and F-score features selection via Chou's 5-steps rule
DNase I hypersensitive sites (DHSs) are regarded as those regions of chromatin that are sensitive to cleavage by the DNase I enzyme. Identification of DNase I hypersensitive sites will provide useful insights for discovering DNA's functional elements from the non-coding sequences in the biomedi...
Saved in:
Published in: | Biophysical chemistry Vol. 253; p. 106227 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
Netherlands
Elsevier B.V
01-10-2019
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | DNase I hypersensitive sites (DHSs) are regarded as those regions of chromatin that are sensitive to cleavage by the DNase I enzyme. Identification of DNase I hypersensitive sites will provide useful insights for discovering DNA's functional elements from the non-coding sequences in the biomedical research. Because of the significance for DNase I hypersensitive sites, it is indispensable to develop an accurate, fast, robust, and high-throughput automated computational model. In this paper, we develop a model named iDHSs-MFF by combining multiple fusion features and F-score features selection approach. The multiple fusion features include three auto-correlation descriptors based on the dinucleotide property matrix and the trinucleotide property matrix (TPM), Pseudo-DPM and Pseudo-TPM. Evaluation by the jackknife cross-validation indicates that the selected features by F-score are effective in the identification of DNase I hypersensitive sites. Experimental results on two benchmark datasets demonstrate that the proposed model outperforms some highly related models. Systematic application of this computational approach will greatly facilitate the analysis of transcriptional regulatory elements. The datasets and Matlab source codes are freely available at: https://github.com/shengli0201/Datasets.
[Display omitted]
•A novel identifying model named iDHSs-MFF is proposed based on dinucleotide and trinucleotide property matrixs.•F-score approach is performed for features selection.•iDHSs-MFF model outperforms some highly related models. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0301-4622 1873-4200 |
DOI: | 10.1016/j.bpc.2019.106227 |