Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data

Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic neighbor...

Full description

Saved in:
Bibliographic Details
Published in:Cell reports (Cambridge) Vol. 36; no. 4; p. 109442
Main Authors: Yang, Yang, Sun, Hongjian, Zhang, Yu, Zhang, Tiefu, Gong, Jialei, Wei, Yunbo, Duan, Yong-Gang, Shu, Minglei, Yang, Yuchen, Wu, Di, Yu, Di
Format: Journal Article
Language:English
Published: United States Elsevier Inc 27-07-2021
Elsevier
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), can efficiently cluster heterogeneous samples in single-cell RNA sequencing analysis. Yet, the application of t-SNE and UMAP in bulk transcriptomic analysis and comparison with conventional methods have not been achieved. We compare four major dimensionality reduction methods (PCA, multidimensional scaling [MDS], t-SNE, and UMAP) in analyzing 71 large bulk transcriptomic datasets. UMAP is superior to PCA and MDS but shows some advantages over t-SNE in differentiating batch effects, identifying pre-defined biological groups, and revealing in-depth clusters in two-dimensional space. Importantly, UMAP generates sample clusters uncovering biological features and clinical meaning. We recommend deploying UMAP in visualizing and analyzing sizable bulk transcriptomic datasets to reinforce sample heterogeneity analysis. [Display omitted] •Four methods, PCA, MDS, t-SNE, and UMAP, are evaluated on 71 bulk transcriptomic datasets•UMAP is overall superior to PCA and MDS and shows some advantages over t-SNE•UMAP can efficiently and effectively reveal clusters in two-dimensional space•Clusters revealed by UMAP are associated with biological features and clinical traits Yang et al. compare four major dimensionality reduction methods (PCA, MDS, t-SNE, and UMAP) in analyzing large bulk transcriptomic datasets. UMAP is overall superior to PCA and MDS and shows some advantages over t-SNE in differentiating batch effects, identifying pre-defined biological groups, and revealing in-depth clusters in two-dimensional space.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2211-1247
2211-1247
DOI:10.1016/j.celrep.2021.109442