Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data
Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic neighbor...
Saved in:
Published in: | Cell reports (Cambridge) Vol. 36; no. 4; p. 109442 |
---|---|
Main Authors: | , , , , , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
United States
Elsevier Inc
27-07-2021
Elsevier |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Transcriptomic analysis plays a key role in biomedical research. Linear dimensionality reduction methods, especially principal-component analysis (PCA), are widely used in detecting sample-to-sample heterogeneity, while recently developed non-linear methods, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), can efficiently cluster heterogeneous samples in single-cell RNA sequencing analysis. Yet, the application of t-SNE and UMAP in bulk transcriptomic analysis and comparison with conventional methods have not been achieved. We compare four major dimensionality reduction methods (PCA, multidimensional scaling [MDS], t-SNE, and UMAP) in analyzing 71 large bulk transcriptomic datasets. UMAP is superior to PCA and MDS but shows some advantages over t-SNE in differentiating batch effects, identifying pre-defined biological groups, and revealing in-depth clusters in two-dimensional space. Importantly, UMAP generates sample clusters uncovering biological features and clinical meaning. We recommend deploying UMAP in visualizing and analyzing sizable bulk transcriptomic datasets to reinforce sample heterogeneity analysis.
[Display omitted]
•Four methods, PCA, MDS, t-SNE, and UMAP, are evaluated on 71 bulk transcriptomic datasets•UMAP is overall superior to PCA and MDS and shows some advantages over t-SNE•UMAP can efficiently and effectively reveal clusters in two-dimensional space•Clusters revealed by UMAP are associated with biological features and clinical traits
Yang et al. compare four major dimensionality reduction methods (PCA, MDS, t-SNE, and UMAP) in analyzing large bulk transcriptomic datasets. UMAP is overall superior to PCA and MDS and shows some advantages over t-SNE in differentiating batch effects, identifying pre-defined biological groups, and revealing in-depth clusters in two-dimensional space. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2211-1247 2211-1247 |
DOI: | 10.1016/j.celrep.2021.109442 |