TMSC-m7G: A transformer architecture based on multi-sense-scaled embedding features and convolutional neural network to identify RNA N7-methylguanosine sites

RNA N7-methylguanosine (m7G) is a crucial chemical modification of RNA molecules, whose principal duty is to maintain RNA function and protein translation. Studying and predicting RNA N7-methylguanosine sites aid in comprehending the biological function of RNA and the development of new drug therapy...

Full description

Saved in:
Bibliographic Details
Published in:Computational and structural biotechnology journal Vol. 23; pp. 129 - 139
Main Authors: Zhang, Shengli, Xu, Yujie, Liang, Yunyun
Format: Journal Article
Language:English
Published: Netherlands Research Network of Computational and Structural Biotechnology 01-12-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:RNA N7-methylguanosine (m7G) is a crucial chemical modification of RNA molecules, whose principal duty is to maintain RNA function and protein translation. Studying and predicting RNA N7-methylguanosine sites aid in comprehending the biological function of RNA and the development of new drug therapy regimens. In the present scenario, the efficacy of techniques, specifically deep learning and machine learning, stands out in the prediction of RNA N7-methylguanosine sites, leading to improved accuracy and identification efficiency. In this study, we propose a model leveraging the transformer framework that integrates natural language processing and deep learning to predict m7G sites, called TMSC-m7G. In TMSC-m7G, a combination of multi-sense-scaled token embedding and fixed-position embedding is used to replace traditional word embedding for the extraction of contextual information from sequences. Moreover, a convolutional layer is added in the encoder to make up for the shortage of local information acquisition in transformer. The model's robustness and generalization are validated through 10-fold cross-validation and an independent dataset test. Results demonstrate outstanding performance in comparison to the most advanced models available. Among them, the Accuracy of TMSC-m7G reaches 98.70% and 92.92% on the benchmark dataset and independent dataset, respectively. To facilitate the popularization and use of the model, we have developed an intuitive online prediction tool, which is easily accessible for free at http://39.105.212.81/.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2001-0370
2001-0370
DOI:10.1016/j.csbj.2023.11.052