A support vector machine approach to the identification of phosphorylation sites

We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method uses the support vector machine (SVM) statistical learning theory. The statistical models for phosphorylation by various types of kinases are...

Full description

Saved in:

Bibliographic Details
Published in:	Cellular & molecular biology letters Vol. 10; no. 1; p. 73
Main Authors:	Plewczyński, Dariusz, Tkacz, Adrian, Godzik, Adam, Rychlewski, Leszek
Format:	Journal Article
Language:	English
Published:	England 2005
Subjects:	Algorithms Computational Biology Cyclic AMP-Dependent Protein Kinases - metabolism Databases, Protein Phosphorylation Protein Kinase C - metabolism Proteins - chemistry Proteins - metabolism
Online Access:	Get more information
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method uses the support vector machine (SVM) statistical learning theory. The statistical models for phosphorylation by various types of kinases are built using a dataset of short (9-amino acid long) sequence fragments. The sequence segments are dissected around post-translationally modified sites of proteins that are on the current release of the Swiss-Prot database, and that were experimentally confirmed to be phosphorylated by any kinase. We represent them as vectors in a multidimensional abstract space of short sequence fragments. The prediction method is as follows. First, a given query protein sequence is dissected into overlapping short segments. All the fragments are then projected into the multidimensional space of sequence fragments via a collection of different representations. Those points are classified with pre-built statistical models (the SVM method with linear, polynomial and radial kernel functions) either as phosphorylated or inactive ones. The resulting list of plausible sites for phosphorylation by various types of kinases in the query protein is returned to the user. The efficiency of the method for each type of phosphorylation is estimated using leave-one-out tests and presented here. The sensitivities of the models can reach over 70%, depending on the type of kinase. The additional information from profile representations of short sequence fragments helps in gaining a higher degree of accuracy in some phosphorylation types. The further development of an automatic phosphorylation site annotation predictor based on our algorithm should yield a significant improvement when using statistical algorithms in order to quantify the results.
AbstractList	We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method uses the support vector machine (SVM) statistical learning theory. The statistical models for phosphorylation by various types of kinases are built using a dataset of short (9-amino acid long) sequence fragments. The sequence segments are dissected around post-translationally modified sites of proteins that are on the current release of the Swiss-Prot database, and that were experimentally confirmed to be phosphorylated by any kinase. We represent them as vectors in a multidimensional abstract space of short sequence fragments. The prediction method is as follows. First, a given query protein sequence is dissected into overlapping short segments. All the fragments are then projected into the multidimensional space of sequence fragments via a collection of different representations. Those points are classified with pre-built statistical models (the SVM method with linear, polynomial and radial kernel functions) either as phosphorylated or inactive ones. The resulting list of plausible sites for phosphorylation by various types of kinases in the query protein is returned to the user. The efficiency of the method for each type of phosphorylation is estimated using leave-one-out tests and presented here. The sensitivities of the models can reach over 70%, depending on the type of kinase. The additional information from profile representations of short sequence fragments helps in gaining a higher degree of accuracy in some phosphorylation types. The further development of an automatic phosphorylation site annotation predictor based on our algorithm should yield a significant improvement when using statistical algorithms in order to quantify the results.
Author	Tkacz, Adrian Godzik, Adam Rychlewski, Leszek Plewczyński, Dariusz
Author_xml	– sequence: 1 givenname: Dariusz surname: Plewczyński fullname: Plewczyński, Dariusz email: darman@bioinfo.pl organization: BioInfoBank Institute, Limanowskiego 24A/16, 60-744 Poznań, Poland. darman@bioinfo.pl – sequence: 2 givenname: Adrian surname: Tkacz fullname: Tkacz, Adrian – sequence: 3 givenname: Adam surname: Godzik fullname: Godzik, Adam – sequence: 4 givenname: Leszek surname: Rychlewski fullname: Rychlewski, Leszek
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/15809681$$D View this record in MEDLINE/PubMed
BookMark	eNo1j8tqAyEYhV2kNJf2FYovMKCOii5D6A0CzaJdh3_0H8aSUVFTyNs3kHZxOB9n8cFZk0VMERdkxaVQneGqX5J1rd-MCSYluydLrgyz2vAVOWxpPeecSqM_6FoqdAY3hYgUci7pyrQl2iakwWNsYQwOWkiRppHmKdVryuV0m2poWB_I3Qinio9_vSFfL8-fu7du__H6vtvuuyyYbR1wlGCN9GjBjoB2VEpr44FZ6QzXjg1OAHLf9zhwqYRkjoHV2sle9V6KDXm6efN5mNEfcwkzlMvx_5r4BQVbTM4
ContentType	Journal Article
DBID	CGR CUY CVF ECM EIF NPM
DatabaseName	Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed
DatabaseTitle	MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid)
DatabaseTitleList	MEDLINE
Database_xml	– sequence: 1 dbid: ECM name: MEDLINE url: https://search.ebscohost.com/login.aspx?direct=true&db=cmedm&site=ehost-live sourceTypes: Index Database
DeliveryMethod	no_fulltext_linktorsrc
Discipline	Biology
ExternalDocumentID	15809681
Genre	Research Support, Non-U.S. Gov't Journal Article
GroupedDBID	--- -56 -5G -BR -Y2 .86 .VR 06D 0R~ 0VY 29B 2JY 2VQ 2WC 2~H 30V 3V. 4.4 408 53G 5GY 5VS 67N 67Z 6NX 6S1 7X7 88A 88E 8AO 8FE 8FH 8FI 8FJ 8TC 8UJ 95. 95~ AAFWJ AAIAL AAJSJ AAKDD AANXM AAQCX AARHV AASQH AAXMT AAYZH ABAQN ABFKT ABMNI ABUWG ACGFS ACOMO ACPRK ACRMQ ACZBO ADBBV ADGYE ADINQ ADKPE ADOZN ADRFC ADUKV AENEX AFBAA AFBBN AFCXV AFGCZ AFKRA AFLOW AFPKN AFWTZ AGJBK AHBYD AHMBA AHSBF ALIPV ALMA_UNASSIGNED_HOLDINGS AMKLP AOIJS BA0 BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGNMA BHPHI BMC BPHCQ BVXVI C24 C6C CAG CCPQU CGR COF CS3 CUY CVF DU5 E3Z EBLON EBS ECM EIF EJD EMOBN F5P FRP FYUFA G-Y G-Z GQ6 GQ7 GROUPED_DOAJ GX1 H13 HCIFZ HF~ HG6 HLICF HMCUK HMJXF HYE HZ~ IAO IEA IHE IJ- IXC IXE IY9 IZQ I~X I~Z KDC KOV LK8 M0L M1P M4Y M7P MA- N9A NPM NU0 OAM OK1 PF0 PIMPY PQQKQ PROAC PSQYO Q2X QOS R9I RBZ RNS ROL RPM RPX RSV S1Z S27 S3A S3B SBL SDH SHX SOJ SZN T13 TR2 TSK TSV TUC U2A UKHRP VC2 WK8 XSB Y2W ~A9
ID	FETCH-LOGICAL-p209t-a1e4a984de9a9fae9f55668da094c816c0bc2ae1d33eb145240c0a966c4353d42
ISSN	1425-8153
IngestDate	Tue Oct 15 23:31:28 EDT 2024
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-p209t-a1e4a984de9a9fae9f55668da094c816c0bc2ae1d33eb145240c0a966c4353d42
PMID	15809681
ParticipantIDs	pubmed_primary_15809681
PublicationCentury	2000
PublicationDate	2005-00-00
PublicationDateYYYYMMDD	2005-01-01
PublicationDate_xml	– year: 2005 text: 2005-00-00
PublicationDecade	2000
PublicationPlace	England
PublicationPlace_xml	– name: England
PublicationTitle	Cellular & molecular biology letters
PublicationTitleAlternate	Cell Mol Biol Lett
PublicationYear	2005
SSID	ssj0020440
Score	1.8594182
Snippet	We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method...
SourceID	pubmed
SourceType	Index Database
StartPage	73
SubjectTerms	Algorithms Computational Biology Cyclic AMP-Dependent Protein Kinases - metabolism Databases, Protein Phosphorylation Protein Kinase C - metabolism Proteins - chemistry Proteins - metabolism
Title	A support vector machine approach to the identification of phosphorylation sites
URI	https://www.ncbi.nlm.nih.gov/pubmed/15809681
Volume	10
hasFullText
inHoldings	1
isFullTextHit
isPrint
link	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtZ3PT8IwFMcb0Jh4Mf7-bXqQ07JkPzrYjgRQDmpIxMQbKf0RicAWBhr4631ru4EYjR48sJB2aZZ9urfv617fQ-jaBVHAfF61qaChTSj17H4QUdsngvl-ll5FZedvP9YensNmi7RKpTyOa9n2r6ShDVhnO2f_QLsYFBrgPzCHI1CH46-41610lmSi2npTC_LWSIVLiiJ7eK42B9wEChWiMXmJU_hN5jo-zso-LKer6rUhhkMVtprNl1FeWNfKEzkN1dagQqR3huKdLeaVRlAJiSmP3QTXfJYuivWCV8rUCnadT1bm6W3MF4NX3U5HxTchMNUwpBnpTqQLk2g4X7QIVhYttJ0FU2GHrs4TXBhi58uE01ZVFztZgZeMFD03CMED0yVffu5dy6mdd5VRGRRSJqIb94WbnhXhVnvSzCWqyk36_DXfQ2mQ7i7aMc4Drmvqe6gkxvtoS5cTnR-gTh0b9lizx4Y9ztnjaYyBPf7MHscSr7HHiv0herppdRtt21TMsBPPiaY2dQWhUUi4iGgkqYhkAHI95BSceAZPHXP6zKPC5b4P72gSgJxjDgWPl4Fq9jnxjtDGOB6LE4Rl5PCqwznzJCF9L6ARA-0IzqhkVVmT3ik61reil-i0KL38Jp1923OOtpeT4QJtSnjmxCUqp3x2pRh8AIuoUws
link.rule.ids	782
linkProvider	EBSCOhost
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+support+vector+machine+approach+to+the+identification+of+phosphorylation+sites&rft.jtitle=Cellular+%26+molecular+biology+letters&rft.au=Plewczy%C5%84ski%2C+Dariusz&rft.au=Tkacz%2C+Adrian&rft.au=Godzik%2C+Adam&rft.au=Rychlewski%2C+Leszek&rft.date=2005-01-01&rft.issn=1425-8153&rft.volume=10&rft.issue=1&rft.spage=73&rft_id=info%3Apmid%2F15809681&rft_id=info%3Apmid%2F15809681&rft.externalDocID=15809681
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1425-8153&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1425-8153&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1425-8153&client=summon