Leveraging machine learning for taxonomic classification of emerging astroviruses

Astroviruses are a family of genetically diverse viruses associated with disease in humans and birds with significant health effects and economic burdens. Astrovirus taxonomic classification includes two genera, and However, with next-generation sequencing, broader interspecies transmission has been...

Full description

Saved in:
Bibliographic Details
Published in:Frontiers in molecular biosciences Vol. 10; p. 1305506
Main Authors: Alipour, Fatemeh, Holmes, Connor, Lu, Yang Young, Hill, Kathleen A, Kari, Lila
Format: Journal Article
Language:English
Published: Switzerland Frontiers Media S.A 11-01-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Astroviruses are a family of genetically diverse viruses associated with disease in humans and birds with significant health effects and economic burdens. Astrovirus taxonomic classification includes two genera, and However, with next-generation sequencing, broader interspecies transmission has been observed necessitating a reexamination of the current host-based taxonomic classification approach. In this study, a novel taxonomic classification method is presented for emergent and as yet unclassified astroviruses, based on whole genome sequence -mer composition in addition to host information. An optional component responsible for identifying recombinant sequences was added to the method's pipeline, to counteract the impact of genetic recombination on viral classification. The proposed three-pronged classification method consists of a supervised machine learning method, an unsupervised machine learning method, and the consideration of host species. Using this three-pronged approach, we propose genus labels for 191 as yet unclassified astrovirus genomes. Genus labels are also suggested for an additional eight as yet unclassified astrovirus genomes for which incompatibility was observed with the host species, suggesting cross-species infection. Lastly, our machine learning-based approach augmented by a principal component analysis (PCA) analysis provides evidence supporting the hypothesis of the existence of human astrovirus ( ) subgenus of the genus , and a goose astrovirus ( ) subgenus of the genus . Overall, this multipronged machine learning approach provides a fast, reliable, and scalable prediction method of taxonomic labels, able to keep pace with emerging viruses and the exponential increase in the output of modern genome sequencing technologies.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
These authors have contributed equally to this work and share first authorship
Pandurang Kolekar, St. Jude Children’s Research Hospital, United States
Edited by: Jayaraman Valadi, Flame University, India
Reviewed by: Indira Ghosh, Jawaharlal Nehru University, India
ISSN:2296-889X
2296-889X
DOI:10.3389/fmolb.2023.1305506