UMAP-DBP: An Improved DNA-Binding Proteins Prediction Method Based on Uniform Manifold Approximation and Projection

DNA-binding proteins play a vital role in cellular processes. It is an extremely urgent to develop a high-throughput method for efficiently identifying DNA-binding proteins. According to the current research situation, some methods in machine learning and deep learning show excellent computational s...

Full description

Saved in:
Bibliographic Details
Published in:The Protein Journal Vol. 40; no. 4; pp. 562 - 575
Main Authors: Wang, Jinyue, Zhang, Shengli, Qiao, Huijuan, Wang, Jiesheng
Format: Journal Article
Language:English
Published: New York Springer US 01-08-2021
Springer
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:DNA-binding proteins play a vital role in cellular processes. It is an extremely urgent to develop a high-throughput method for efficiently identifying DNA-binding proteins. According to the current research situation, some methods in machine learning and deep learning show excellent computational speed and accuracy, which are worthy of application. In this work, a novel predictor was proposed to predict DNA binding proteins called UMAP-DBP. Firstly, the feature extraction of primary protein sequence was realized based on physicochemical distance transformation, Profile-based auto-cross covariance and General series correlation pseudo amino acid composition. Secondly, uniform manifold approximation and projection (UMAP) and feature importance score methods were used for feature selection; there is a progressive relationship between them. Finally, the Adaboost operation engine with jackknife test were adopted for predicting DNA-binding proteins. For the jackknife test on the BP1075 and BP594, we obtained an overall accuracy of 82.97% and 82.14%, Cohen's kappa (CK) of 0.66 and 0.64, respectively. The results illustrate that a feasible method has been developed for predicting DNA-binding proteins by UMAP and Adaboost. This is the first study in which UMAP has been successfully applied to identify DNA-binding proteins. All the datasets and codes are accessible at https://github.com/Wang-Jinyue/UMAP-DBP .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1572-3887
1573-4943
1875-8355
DOI:10.1007/s10930-021-10011-y