A support vector machine approach to the identification of phosphorylation sites
We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method uses the support vector machine (SVM) statistical learning theory. The statistical models for phosphorylation by various types of kinases are...
Saved in:
Published in: | Cellular & molecular biology letters Vol. 10; no. 1; p. 73 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
England
2005
|
Subjects: | |
Online Access: | Get more information |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method uses the support vector machine (SVM) statistical learning theory. The statistical models for phosphorylation by various types of kinases are built using a dataset of short (9-amino acid long) sequence fragments. The sequence segments are dissected around post-translationally modified sites of proteins that are on the current release of the Swiss-Prot database, and that were experimentally confirmed to be phosphorylated by any kinase. We represent them as vectors in a multidimensional abstract space of short sequence fragments. The prediction method is as follows. First, a given query protein sequence is dissected into overlapping short segments. All the fragments are then projected into the multidimensional space of sequence fragments via a collection of different representations. Those points are classified with pre-built statistical models (the SVM method with linear, polynomial and radial kernel functions) either as phosphorylated or inactive ones. The resulting list of plausible sites for phosphorylation by various types of kinases in the query protein is returned to the user. The efficiency of the method for each type of phosphorylation is estimated using leave-one-out tests and presented here. The sensitivities of the models can reach over 70%, depending on the type of kinase. The additional information from profile representations of short sequence fragments helps in gaining a higher degree of accuracy in some phosphorylation types. The further development of an automatic phosphorylation site annotation predictor based on our algorithm should yield a significant improvement when using statistical algorithms in order to quantify the results. |
---|---|
AbstractList | We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method uses the support vector machine (SVM) statistical learning theory. The statistical models for phosphorylation by various types of kinases are built using a dataset of short (9-amino acid long) sequence fragments. The sequence segments are dissected around post-translationally modified sites of proteins that are on the current release of the Swiss-Prot database, and that were experimentally confirmed to be phosphorylated by any kinase. We represent them as vectors in a multidimensional abstract space of short sequence fragments. The prediction method is as follows. First, a given query protein sequence is dissected into overlapping short segments. All the fragments are then projected into the multidimensional space of sequence fragments via a collection of different representations. Those points are classified with pre-built statistical models (the SVM method with linear, polynomial and radial kernel functions) either as phosphorylated or inactive ones. The resulting list of plausible sites for phosphorylation by various types of kinases in the query protein is returned to the user. The efficiency of the method for each type of phosphorylation is estimated using leave-one-out tests and presented here. The sensitivities of the models can reach over 70%, depending on the type of kinase. The additional information from profile representations of short sequence fragments helps in gaining a higher degree of accuracy in some phosphorylation types. The further development of an automatic phosphorylation site annotation predictor based on our algorithm should yield a significant improvement when using statistical algorithms in order to quantify the results. |
Author | Tkacz, Adrian Godzik, Adam Rychlewski, Leszek Plewczyński, Dariusz |
Author_xml | – sequence: 1 givenname: Dariusz surname: Plewczyński fullname: Plewczyński, Dariusz email: darman@bioinfo.pl organization: BioInfoBank Institute, Limanowskiego 24A/16, 60-744 Poznań, Poland. darman@bioinfo.pl – sequence: 2 givenname: Adrian surname: Tkacz fullname: Tkacz, Adrian – sequence: 3 givenname: Adam surname: Godzik fullname: Godzik, Adam – sequence: 4 givenname: Leszek surname: Rychlewski fullname: Rychlewski, Leszek |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/15809681$$D View this record in MEDLINE/PubMed |
BookMark | eNo1j8tqAyEYhV2kNJf2FYovMKCOii5D6A0CzaJdh3_0H8aSUVFTyNs3kHZxOB9n8cFZk0VMERdkxaVQneGqX5J1rd-MCSYluydLrgyz2vAVOWxpPeecSqM_6FoqdAY3hYgUci7pyrQl2iakwWNsYQwOWkiRppHmKdVryuV0m2poWB_I3Qinio9_vSFfL8-fu7du__H6vtvuuyyYbR1wlGCN9GjBjoB2VEpr44FZ6QzXjg1OAHLf9zhwqYRkjoHV2sle9V6KDXm6efN5mNEfcwkzlMvx_5r4BQVbTM4 |
ContentType | Journal Article |
DBID | CGR CUY CVF ECM EIF NPM |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) |
DatabaseTitleList | MEDLINE |
Database_xml | – sequence: 1 dbid: ECM name: MEDLINE url: https://search.ebscohost.com/login.aspx?direct=true&db=cmedm&site=ehost-live sourceTypes: Index Database |
DeliveryMethod | no_fulltext_linktorsrc |
Discipline | Biology |
ExternalDocumentID | 15809681 |
Genre | Research Support, Non-U.S. Gov't Journal Article |
GroupedDBID | --- -56 -5G -BR -Y2 .86 .VR 06D 0R~ 0VY 29B 2JY 2VQ 2WC 2~H 30V 3V. 4.4 408 53G 5GY 5VS 67N 67Z 6NX 6S1 7X7 88A 88E 8AO 8FE 8FH 8FI 8FJ 8TC 8UJ 95. 95~ AAFWJ AAIAL AAJSJ AAKDD AANXM AAQCX AARHV AASQH AAXMT AAYZH ABAQN ABFKT ABMNI ABUWG ACGFS ACOMO ACPRK ACRMQ ACZBO ADBBV ADGYE ADINQ ADKPE ADOZN ADRFC ADUKV AENEX AFBAA AFBBN AFCXV AFGCZ AFKRA AFLOW AFPKN AFWTZ AGJBK AHBYD AHMBA AHSBF ALIPV ALMA_UNASSIGNED_HOLDINGS AMKLP AOIJS BA0 BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGNMA BHPHI BMC BPHCQ BVXVI C24 C6C CAG CCPQU CGR COF CS3 CUY CVF DU5 E3Z EBLON EBS ECM EIF EJD EMOBN F5P FRP FYUFA G-Y G-Z GQ6 GQ7 GROUPED_DOAJ GX1 H13 HCIFZ HF~ HG6 HLICF HMCUK HMJXF HYE HZ~ IAO IEA IHE IJ- IXC IXE IY9 IZQ I~X I~Z KDC KOV LK8 M0L M1P M4Y M7P MA- N9A NPM NU0 OAM OK1 PF0 PIMPY PQQKQ PROAC PSQYO Q2X QOS R9I RBZ RNS ROL RPM RPX RSV S1Z S27 S3A S3B SBL SDH SHX SOJ SZN T13 TR2 TSK TSV TUC U2A UKHRP VC2 WK8 XSB Y2W ~A9 |
ID | FETCH-LOGICAL-p209t-a1e4a984de9a9fae9f55668da094c816c0bc2ae1d33eb145240c0a966c4353d42 |
ISSN | 1425-8153 |
IngestDate | Tue Oct 15 23:31:28 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p209t-a1e4a984de9a9fae9f55668da094c816c0bc2ae1d33eb145240c0a966c4353d42 |
PMID | 15809681 |
ParticipantIDs | pubmed_primary_15809681 |
PublicationCentury | 2000 |
PublicationDate | 2005-00-00 |
PublicationDateYYYYMMDD | 2005-01-01 |
PublicationDate_xml | – year: 2005 text: 2005-00-00 |
PublicationDecade | 2000 |
PublicationPlace | England |
PublicationPlace_xml | – name: England |
PublicationTitle | Cellular & molecular biology letters |
PublicationTitleAlternate | Cell Mol Biol Lett |
PublicationYear | 2005 |
SSID | ssj0020440 |
Score | 1.8594182 |
Snippet | We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method... |
SourceID | pubmed |
SourceType | Index Database |
StartPage | 73 |
SubjectTerms | Algorithms Computational Biology Cyclic AMP-Dependent Protein Kinases - metabolism Databases, Protein Phosphorylation Protein Kinase C - metabolism Proteins - chemistry Proteins - metabolism |
Title | A support vector machine approach to the identification of phosphorylation sites |
URI | https://www.ncbi.nlm.nih.gov/pubmed/15809681 |
Volume | 10 |
hasFullText | |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtZ3PT8IwFMcb0Jh4Mf7-bXqQ07JkPzrYjgRQDmpIxMQbKf0RicAWBhr4631ru4EYjR48sJB2aZZ9urfv617fQ-jaBVHAfF61qaChTSj17H4QUdsngvl-ll5FZedvP9YensNmi7RKpTyOa9n2r6ShDVhnO2f_QLsYFBrgPzCHI1CH46-41610lmSi2npTC_LWSIVLiiJ7eK42B9wEChWiMXmJU_hN5jo-zso-LKer6rUhhkMVtprNl1FeWNfKEzkN1dagQqR3huKdLeaVRlAJiSmP3QTXfJYuivWCV8rUCnadT1bm6W3MF4NX3U5HxTchMNUwpBnpTqQLk2g4X7QIVhYttJ0FU2GHrs4TXBhi58uE01ZVFztZgZeMFD03CMED0yVffu5dy6mdd5VRGRRSJqIb94WbnhXhVnvSzCWqyk36_DXfQ2mQ7i7aMc4Drmvqe6gkxvtoS5cTnR-gTh0b9lizx4Y9ztnjaYyBPf7MHscSr7HHiv0herppdRtt21TMsBPPiaY2dQWhUUi4iGgkqYhkAHI95BSceAZPHXP6zKPC5b4P72gSgJxjDgWPl4Fq9jnxjtDGOB6LE4Rl5PCqwznzJCF9L6ARA-0IzqhkVVmT3ik61reil-i0KL38Jp1923OOtpeT4QJtSnjmxCUqp3x2pRh8AIuoUws |
link.rule.ids | 782 |
linkProvider | EBSCOhost |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+support+vector+machine+approach+to+the+identification+of+phosphorylation+sites&rft.jtitle=Cellular+%26+molecular+biology+letters&rft.au=Plewczy%C5%84ski%2C+Dariusz&rft.au=Tkacz%2C+Adrian&rft.au=Godzik%2C+Adam&rft.au=Rychlewski%2C+Leszek&rft.date=2005-01-01&rft.issn=1425-8153&rft.volume=10&rft.issue=1&rft.spage=73&rft_id=info%3Apmid%2F15809681&rft_id=info%3Apmid%2F15809681&rft.externalDocID=15809681 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1425-8153&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1425-8153&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1425-8153&client=summon |