Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data

Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic neighbor...

Full description

Saved in:

Bibliographic Details
Published in:	Cell reports (Cambridge) Vol. 36; no. 4; p. 109442
Main Authors:	Yang, Yang, Sun, Hongjian, Zhang, Yu, Zhang, Tiefu, Gong, Jialei, Wei, Yunbo, Duan, Yong-Gang, Shu, Minglei, Yang, Yuchen, Wu, Di, Yu, Di
Format:	Journal Article
Language:	English
Published:	United States Elsevier Inc 27-07-2021 Elsevier
Subjects:	Algorithms bulk transcriptomics Cluster Analysis clustering structure Data Analysis Databases, Genetic dimensionality reduction Gene Expression Profiling heterogeneity analysis Humans PCA Principal Component Analysis Reproducibility of Results t-SNE UMAP bulk transcriptomics UMAP dimensionality reduction heterogeneity analysis t-SNE clustering structure PCA
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), can efficiently cluster heterogeneous samples in single-cell RNA sequencing analysis. Yet, the application of t-SNE and UMAP in bulk transcriptomic analysis and comparison with conventional methods have not been achieved. We compare four major dimensionality reduction methods (PCA, multidimensional scaling [MDS], t-SNE, and UMAP) in analyzing 71 large bulk transcriptomic datasets. UMAP is superior to PCA and MDS but shows some advantages over t-SNE in differentiating batch effects, identifying pre-defined biological groups, and revealing in-depth clusters in two-dimensional space. Importantly, UMAP generates sample clusters uncovering biological features and clinical meaning. We recommend deploying UMAP in visualizing and analyzing sizable bulk transcriptomic datasets to reinforce sample heterogeneity analysis. [Display omitted] •Four methods, PCA, MDS, t-SNE, and UMAP, are evaluated on 71 bulk transcriptomic datasets•UMAP is overall superior to PCA and MDS and shows some advantages over t-SNE•UMAP can efficiently and effectively reveal clusters in two-dimensional space•Clusters revealed by UMAP are associated with biological features and clinical traits Yang et al. compare four major dimensionality reduction methods (PCA, MDS, t-SNE, and UMAP) in analyzing large bulk transcriptomic datasets. UMAP is overall superior to PCA and MDS and shows some advantages over t-SNE in differentiating batch effects, identifying pre-defined biological groups, and revealing in-depth clusters in two-dimensional space.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2211-1247 2211-1247
DOI:	10.1016/j.celrep.2021.109442