Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data

The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, limitations, and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasi...

Full description

Saved in:
Bibliographic Details
Published in:RNA (Cambridge) Vol. 29; no. 12; pp. 1839 - 1855
Main Authors: Bryce-Smith, Sam, Burri, Dominik, Gazzara, Matthew R, Herrmann, Christina J, Danecka, Weronika, Fitzsimmons, Christina M, Wan, Yuk Kei, Zhuang, Farica, Fansler, Mervin M, Fernández, José M, Ferret, Meritxell, Gonzalez-Uriarte, Asier, Haynes, Samuel, Herdman, Chelsea, Kanitz, Alexander, Katsantoni, Maria, Marini, Federico, McDonnel, Euan, Nicolet, Ben, Poon, Chi-Lam, Rot, Gregor, Schärfen, Leonard, Wu, Pin-Jou, Yoon, Yoseop, Barash, Yoseph, Zavolan, Mihaela
Format: Journal Article
Language:English
Published: United States Cold Spring Harbor Laboratory Press 01-12-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, limitations, and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3'-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for continuous extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies, while the containers and reproducible workflows could easily be deployed and extended to evaluate new methods or data sets.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-3
content type line 23
ObjectType-Review-1
These authors contributed equally to this work.
ISSN:1355-8382
1469-9001
DOI:10.1261/rna.079849.123