Detection and Validation of Non-synonymous Coding SNPs from Orthogonal Analysis of Shotgun Proteomics Data

Orthogonal analysis of amino acid substitutions as a result of SNPs in existing proteomic datasets provides a critical foundation for the emerging field of population-based proteomics. Large-scale proteomics datasets, derived from shotgun tandem MS analysis of complex cellular protein mixtures, cont...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of proteome research Vol. 6; no. 6; pp. 2331 - 2340
Main Authors:	Bunger, Maureen K, Cargile, Benjamin J, Sevinsky, Joel R, Deyanova, Ekaterina, Yates, Nathan A, Hendrickson, Ronald C, Stephenson, James L
Format:	Journal Article
Language:	English
Published:	United States American Chemical Society 01-06-2007
Subjects:	Amino Acid Sequence Amino Acid Substitution - genetics Breast Neoplasms - chemistry Databases, Protein Humans Molecular Sequence Data Peptides - analysis Peptides - genetics Polymerase Chain Reaction Polymorphism, Single Nucleotide Proteins - analysis Proteins - genetics Proteomics - methods Sequence Analysis, DNA Sequence Analysis, Protein
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Orthogonal analysis of amino acid substitutions as a result of SNPs in existing proteomic datasets provides a critical foundation for the emerging field of population-based proteomics. Large-scale proteomics datasets, derived from shotgun tandem MS analysis of complex cellular protein mixtures, contain many unassigned spectra that may correspond to alternate alleles coded by SNPs. The purpose of this work was to identify tandem MS spectra in LC−MS/MS shotgun proteomics datasets that may represent coding nonsynonymous SNPs (nsSNP). To this end, we generated a tryptic peptide database created from allelic information found in NCBI's dbSNP. We searched this database with tandem MS spectra of tryptic peptides from DU4475 breast tumor cells that had been fractioned by pI in the first-dimension and reverse-phase LC in the second dimension. In all we identified 629 nsSNPs, of which 36 were of alternate SNP alleles not found in the reference NCBI or IPI protein databases. Searches for SNP-peptides carry a high risk of false positives due both to mass shifts caused by modifications and because of multiple representations of the same peptide within the genome. In this work, false positives were filtered using a novel peptide pI prediction algorithm and characterized using a decoy database developed by random substitution of similarly sized reference peptides. Secondary validation by sequencing of corresponding genomic DNA confirmed the presence of the predicted SNP in 8 of 10 SNP-peptides. This work highlights that the usefulness of interpreting unassigned spectra as polymorphisms is highly reliant on the ability to detect and filter false positives. Keywords: LC−MS/MS • single nulceotide polymorphism • false-positives • isoelectric focusing • pI filtering • population proteomics
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1535-3893 1535-3907
DOI:	10.1021/pr0700908