Properties of average score distributions of SEQUEST: the probability ratio method

High throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern proteomics. Common approaches to interpret large scale peptide identification results are based on the statistical analysis of average score distributions, which are constructed fr...

Full description

Saved in:

Bibliographic Details
Published in:	Molecular & cellular proteomics Vol. 7; no. 6; pp. 1135 - 1145
Main Authors:	Martínez-Bartolomé, Salvador, Navarro, Pedro, Martín-Maroto, Fernando, López-Ferrer, Daniel, Ramos-Fernández, Antonio, Villar, Margarita, García-Ruiz, Josefa P, Vázquez, Jesús
Format:	Journal Article
Language:	English
Published:	United States 01-06-2008
Subjects:	Algorithms Automation Computational Biology Databases, Protein Humans Jurkat Cells Mass Spectrometry - methods Mesenchymal Stem Cells - metabolism Models, Statistical Models, Theoretical Peptides - chemistry Probability Proteomics - methods Reproducibility of Results Tandem Mass Spectrometry - methods
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	High throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern proteomics. Common approaches to interpret large scale peptide identification results are based on the statistical analysis of average score distributions, which are constructed from the set of best scores produced by large collections of MS/MS spectra by using searching engines such as SEQUEST. Other approaches calculate individual peptide identification probabilities on the basis of theoretical models or from single-spectrum score distributions constructed by the set of scores produced by each MS/MS spectrum. In this work, we study the mathematical properties of average SEQUEST score distributions by introducing the concept of spectrum quality and expressing these average distributions as compositions of single-spectrum distributions. We predict and demonstrate in the practice that average score distributions are dominated by the quality distribution in the spectra collection, except in the low probability region, where it is possible to predict the dependence of average probability on database size. Our analysis leads to a novel indicator, the probability ratio, which takes optimally into account the statistical information provided by the first and second best scores. The probability ratio is a non-parametric and robust indicator that makes spectra classification according to parameters such as charge state unnecessary and allows a peptide identification performance, on the basis of false discovery rates, that is better than that obtained by other empirical statistical approaches. The probability ratio also compares favorably with statistical probability indicators obtained by the construction of single-spectrum SEQUEST score distributions. These results make the robustness, conceptual simplicity, and ease of automation of the probability ratio algorithm a very attractive alternative to determine peptide identification confidences and error rates in high throughput experiments.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1535-9484
DOI:	10.1074/mcp.M700239-MCP200