Massive mining of publicly available RNA-seq data from human and mouse

RNA sequencing (RNA-seq) is the leading technology for genome-wide transcript quantification. However, publicly available RNA-seq data is currently provided mostly in raw form, a significant barrier for global and integrative retrospective analyses. ARCHS4 is a web resource that makes the majority o...

Full description

Saved in:
Bibliographic Details
Published in:Nature communications Vol. 9; no. 1; pp. 1366 - 10
Main Authors: Lachmann, Alexander, Torre, Denis, Keenan, Alexandra B., Jagodnik, Kathleen M., Lee, Hoyjin J., Wang, Lily, Silverstein, Moshe C., Ma’ayan, Avi
Format: Journal Article
Language:English
Published: London Nature Publishing Group UK 10-04-2018
Nature Publishing Group
Nature Portfolio
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:RNA sequencing (RNA-seq) is the leading technology for genome-wide transcript quantification. However, publicly available RNA-seq data is currently provided mostly in raw form, a significant barrier for global and integrative retrospective analyses. ARCHS4 is a web resource that makes the majority of published RNA-seq data from human and mouse available at the gene and transcript levels. For developing ARCHS4, available FASTQ files from RNA-seq experiments from the Gene Expression Omnibus (GEO) were aligned using a cloud-based infrastructure. In total 187,946 samples are accessible through ARCHS4 with 103,083 mouse and 84,863 human. Additionally, the ARCHS4 web interface provides intuitive exploration of the processed data through querying tools, interactive visualization, and gene pages that provide average expression across cell lines and tissues, top co-expressed genes for each gene, and predicted biological functions and protein–protein interactions for each gene based on prior knowledge combined with co-expression. Publicly available RNA-seq data is provided mostly in raw form, resulting in a barrier for integrative analyses. Here, Lachmann et al. develop a high-throughput processing infrastructure and search database (ARCHS4) that provides processed RNA-seq data for 187,946 publicly available mouse and human samples to support exploration and reuse.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-018-03751-6