INFERYS rescoring: Boosting peptide identifications and scoring confidence of database search results

Database search engines for bottom-up proteomics largely ignore peptide fragment ion intensities during the automated scoring of tandem mass spectra against protein databases. Recent advances in deep learning allow the accurate prediction of peptide fragment ion intensities. Using these predictions...

Full description

Saved in:
Bibliographic Details
Published in:Rapid communications in mass spectrometry p. e9128
Main Authors: Zolg, Daniel P, Gessulat, Siegfried, Paschke, Carmen, Graber, Michael, Rathke-Kuhnert, Magnus, Seefried, Florian, Fitzemeier, Kai, Berg, Frank, Lopez-Ferrer, Daniel, Horn, David, Henrich, Christoph, Huhmer, Andreas, Delanghe, Bernard, Frejno, Martin
Format: Journal Article
Language:English
Published: England 28-06-2021
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Database search engines for bottom-up proteomics largely ignore peptide fragment ion intensities during the automated scoring of tandem mass spectra against protein databases. Recent advances in deep learning allow the accurate prediction of peptide fragment ion intensities. Using these predictions to calculate additional intensity-based scores helps to overcome this drawback. Here, we describe a processing workflow termed INFERYS™ rescoring for the intensity-based rescoring of Sequest HT search engine results in Thermo Scientific™ Proteome Discoverer™ 2.5 software. The workflow is based on the deep learning platform INFERYS capable of predicting fragment ion intensities, which runs on personal computers without the need for graphics processing units. This workflow calculates intensity-based scores comparing peptide spectrum matches from Sequest HT and predicted spectra. Resulting scores are combined with classical search engine scores for input to the false discovery rate estimation tool Percolator. We demonstrate the merits of this approach by analyzing a classical HeLa standard sample and exemplify how this workflow leads to a better separation of target and decoy identifications, in turn resulting in increased peptide spectrum match, peptide and protein identification numbers. On an immunopeptidome dataset, this workflow leads to a 50% increase in identified peptides, emphasizing the advantage of intensity-based scores when analyzing low-intensity spectra or analytes with very similar physicochemical properties that require vast search spaces. Overall, the end-to-end integration of INFERYS rescoring enables simple and easy access to a powerful enhancement to classical database search engines, promising a deeper, more confident and more comprehensive analysis of proteomic data from any organism by unlocking the intensity dimension of tandem mass spectra for identification and more confident scoring.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0951-4198
1097-0231
DOI:10.1002/rcm.9128