Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data

Untargeted mass spectrometry (MS)-based proteomics provides a powerful platform for protein biomarker discovery, but clinical translation depends on the selection of a small number of proteins for downstream verification and validation. Due to the small sample size of typical discovery studies, prot...

Full description

Saved in:
Bibliographic Details
Published in:Molecular & cellular proteomics Vol. 20; p. 100083
Main Authors: Shi, Zhiao, Wen, Bo, Gao, Qiang, Zhang, Bing
Format: Journal Article
Language:English
Published: United States Elsevier Inc 01-01-2021
American Society for Biochemistry and Molecular Biology
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Untargeted mass spectrometry (MS)-based proteomics provides a powerful platform for protein biomarker discovery, but clinical translation depends on the selection of a small number of proteins for downstream verification and validation. Due to the small sample size of typical discovery studies, protein markers identified from discovery data may not be generalizable to independent datasets. In addition, a good protein marker identified using a discovery platform may be difficult to implement in verification and validation platforms. Moreover, although multiomics characterization is being increasingly used in discovery cohort studies, there is no existing method for multiomics-facilitated protein biomarker selection. Here, we present ProMS, a computational algorithm for protein marker selection. The algorithm is based on the hypothesis that a phenotype is characterized by a few underlying biological functions, each manifested by a group of coexpressed proteins. A weighted k-medoids clustering algorithm is applied to all univariately informative proteins to identify both coexpressed protein clusters and a representative protein for each cluster as markers. In two clinically important classification problems, ProMS shows superior performance compared with existing feature selection methods. ProMS can be extended to the multiomics setting (ProMS_mo) through a constrained weighted k-medoids clustering algorithm, and the protein panels selected by ProMS_mo show improved performance on independent test data compared with ProMS. In addition to superior performance, ProMS and ProMS_mo also have two unique strengths. First, the feature clusters enable functional interpretation of the selected protein markers. Second, the feature clusters provide an opportunity to select replacement protein markers, facilitating a robust transition to the verification and validation platforms. In summary, this study provides a unified and effective computational framework for selecting protein biomarkers using proteomics or multiomics data. The software implementation is publicly available at https://github.com/bzhanglab/proms. [Display omitted] •New algorithms enable protein biomarker discovery from proteomics or multiomics data.•Superior performance is demonstrated in two clinically important classification problems.•Feature clusters facilitate functional interpretation of the identified protein biomarkers.•Alternative choices are provided for each identified protein biomarker. Untargeted mass spectrometry–based proteomics provides a powerful platform for protein biomarker discovery, but clinical translation depends on the selection of a small number of proteins for verification and validation. We present feature selection methods for protein biomarker selection from proteomics or multiomics data. The algorithms show good performance, enable functional interpretation of the identified markers, and provide alternative choices for each identified marker to facilitate a robust transition to the verification and validation platforms.
ISSN:1535-9476
1535-9484
DOI:10.1016/j.mcpro.2021.100083