A support vector machine approach to the identification of phosphorylation sites

We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method uses the support vector machine (SVM) statistical learning theory. The statistical models for phosphorylation by various types of kinases are...

Full description

Saved in:
Bibliographic Details
Published in:Cellular & molecular biology letters Vol. 10; no. 1; p. 73
Main Authors: Plewczyński, Dariusz, Tkacz, Adrian, Godzik, Adam, Rychlewski, Leszek
Format: Journal Article
Language:English
Published: England 2005
Subjects:
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method uses the support vector machine (SVM) statistical learning theory. The statistical models for phosphorylation by various types of kinases are built using a dataset of short (9-amino acid long) sequence fragments. The sequence segments are dissected around post-translationally modified sites of proteins that are on the current release of the Swiss-Prot database, and that were experimentally confirmed to be phosphorylated by any kinase. We represent them as vectors in a multidimensional abstract space of short sequence fragments. The prediction method is as follows. First, a given query protein sequence is dissected into overlapping short segments. All the fragments are then projected into the multidimensional space of sequence fragments via a collection of different representations. Those points are classified with pre-built statistical models (the SVM method with linear, polynomial and radial kernel functions) either as phosphorylated or inactive ones. The resulting list of plausible sites for phosphorylation by various types of kinases in the query protein is returned to the user. The efficiency of the method for each type of phosphorylation is estimated using leave-one-out tests and presented here. The sensitivities of the models can reach over 70%, depending on the type of kinase. The additional information from profile representations of short sequence fragments helps in gaining a higher degree of accuracy in some phosphorylation types. The further development of an automatic phosphorylation site annotation predictor based on our algorithm should yield a significant improvement when using statistical algorithms in order to quantify the results.
AbstractList We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method uses the support vector machine (SVM) statistical learning theory. The statistical models for phosphorylation by various types of kinases are built using a dataset of short (9-amino acid long) sequence fragments. The sequence segments are dissected around post-translationally modified sites of proteins that are on the current release of the Swiss-Prot database, and that were experimentally confirmed to be phosphorylated by any kinase. We represent them as vectors in a multidimensional abstract space of short sequence fragments. The prediction method is as follows. First, a given query protein sequence is dissected into overlapping short segments. All the fragments are then projected into the multidimensional space of sequence fragments via a collection of different representations. Those points are classified with pre-built statistical models (the SVM method with linear, polynomial and radial kernel functions) either as phosphorylated or inactive ones. The resulting list of plausible sites for phosphorylation by various types of kinases in the query protein is returned to the user. The efficiency of the method for each type of phosphorylation is estimated using leave-one-out tests and presented here. The sensitivities of the models can reach over 70%, depending on the type of kinase. The additional information from profile representations of short sequence fragments helps in gaining a higher degree of accuracy in some phosphorylation types. The further development of an automatic phosphorylation site annotation predictor based on our algorithm should yield a significant improvement when using statistical algorithms in order to quantify the results.
Author Tkacz, Adrian
Godzik, Adam
Rychlewski, Leszek
Plewczyński, Dariusz
Author_xml – sequence: 1
  givenname: Dariusz
  surname: Plewczyński
  fullname: Plewczyński, Dariusz
  email: darman@bioinfo.pl
  organization: BioInfoBank Institute, Limanowskiego 24A/16, 60-744 Poznań, Poland. darman@bioinfo.pl
– sequence: 2
  givenname: Adrian
  surname: Tkacz
  fullname: Tkacz, Adrian
– sequence: 3
  givenname: Adam
  surname: Godzik
  fullname: Godzik, Adam
– sequence: 4
  givenname: Leszek
  surname: Rychlewski
  fullname: Rychlewski, Leszek
BackLink https://www.ncbi.nlm.nih.gov/pubmed/15809681$$D View this record in MEDLINE/PubMed
BookMark eNo1j8tqAyEYhV2kNJf2FYovMKCOii5D6A0CzaJdh3_0H8aSUVFTyNs3kHZxOB9n8cFZk0VMERdkxaVQneGqX5J1rd-MCSYluydLrgyz2vAVOWxpPeecSqM_6FoqdAY3hYgUci7pyrQl2iakwWNsYQwOWkiRppHmKdVryuV0m2poWB_I3Qinio9_vSFfL8-fu7du__H6vtvuuyyYbR1wlGCN9GjBjoB2VEpr44FZ6QzXjg1OAHLf9zhwqYRkjoHV2sle9V6KDXm6efN5mNEfcwkzlMvx_5r4BQVbTM4
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
DatabaseTitleList MEDLINE
Database_xml – sequence: 1
  dbid: ECM
  name: MEDLINE
  url: https://search.ebscohost.com/login.aspx?direct=true&db=cmedm&site=ehost-live
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Biology
ExternalDocumentID 15809681
Genre Research Support, Non-U.S. Gov't
Journal Article
GroupedDBID ---
-56
-5G
-BR
-Y2
.86
.VR
06D
0R~
0VY
29B
2JY
2VQ
2WC
2~H
30V
3V.
4.4
408
53G
5GY
5VS
67N
67Z
6NX
6S1
7X7
88A
88E
8AO
8FE
8FH
8FI
8FJ
8TC
8UJ
95.
95~
AAFWJ
AAIAL
AAJSJ
AAKDD
AANXM
AAQCX
AARHV
AASQH
AAXMT
AAYZH
ABAQN
ABFKT
ABMNI
ABUWG
ACGFS
ACOMO
ACPRK
ACRMQ
ACZBO
ADBBV
ADGYE
ADINQ
ADKPE
ADOZN
ADRFC
ADUKV
AENEX
AFBAA
AFBBN
AFCXV
AFGCZ
AFKRA
AFLOW
AFPKN
AFWTZ
AGJBK
AHBYD
AHMBA
AHSBF
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AOIJS
BA0
BAPOH
BAWUL
BBNVY
BCNDV
BENPR
BFQNJ
BGNMA
BHPHI
BMC
BPHCQ
BVXVI
C24
C6C
CAG
CCPQU
CGR
COF
CS3
CUY
CVF
DU5
E3Z
EBLON
EBS
ECM
EIF
EJD
EMOBN
F5P
FRP
FYUFA
G-Y
G-Z
GQ6
GQ7
GROUPED_DOAJ
GX1
H13
HCIFZ
HF~
HG6
HLICF
HMCUK
HMJXF
HYE
HZ~
IAO
IEA
IHE
IJ-
IXC
IXE
IY9
IZQ
I~X
I~Z
KDC
KOV
LK8
M0L
M1P
M4Y
M7P
MA-
N9A
NPM
NU0
OAM
OK1
PF0
PIMPY
PQQKQ
PROAC
PSQYO
Q2X
QOS
R9I
RBZ
RNS
ROL
RPM
RPX
RSV
S1Z
S27
S3A
S3B
SBL
SDH
SHX
SOJ
SZN
T13
TR2
TSK
TSV
TUC
U2A
UKHRP
VC2
WK8
XSB
Y2W
~A9
ID FETCH-LOGICAL-p209t-a1e4a984de9a9fae9f55668da094c816c0bc2ae1d33eb145240c0a966c4353d42
ISSN 1425-8153
IngestDate Tue Oct 15 23:31:28 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p209t-a1e4a984de9a9fae9f55668da094c816c0bc2ae1d33eb145240c0a966c4353d42
PMID 15809681
ParticipantIDs pubmed_primary_15809681
PublicationCentury 2000
PublicationDate 2005-00-00
PublicationDateYYYYMMDD 2005-01-01
PublicationDate_xml – year: 2005
  text: 2005-00-00
PublicationDecade 2000
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Cellular & molecular biology letters
PublicationTitleAlternate Cell Mol Biol Lett
PublicationYear 2005
SSID ssj0020440
Score 1.8594182
Snippet We describe a bioinformatics tool that can be used to predict the position of phosphorylation sites in proteins based only on sequence information. The method...
SourceID pubmed
SourceType Index Database
StartPage 73
SubjectTerms Algorithms
Computational Biology
Cyclic AMP-Dependent Protein Kinases - metabolism
Databases, Protein
Phosphorylation
Protein Kinase C - metabolism
Proteins - chemistry
Proteins - metabolism
Title A support vector machine approach to the identification of phosphorylation sites
URI https://www.ncbi.nlm.nih.gov/pubmed/15809681
Volume 10
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtZ3PT8IwFMcb0Jh4Mf7-bXqQ07JkPzrYjgRQDmpIxMQbKf0RicAWBhr4631ru4EYjR48sJB2aZZ9urfv617fQ-jaBVHAfF61qaChTSj17H4QUdsngvl-ll5FZedvP9YensNmi7RKpTyOa9n2r6ShDVhnO2f_QLsYFBrgPzCHI1CH46-41610lmSi2npTC_LWSIVLiiJ7eK42B9wEChWiMXmJU_hN5jo-zso-LKer6rUhhkMVtprNl1FeWNfKEzkN1dagQqR3huKdLeaVRlAJiSmP3QTXfJYuivWCV8rUCnadT1bm6W3MF4NX3U5HxTchMNUwpBnpTqQLk2g4X7QIVhYttJ0FU2GHrs4TXBhi58uE01ZVFztZgZeMFD03CMED0yVffu5dy6mdd5VRGRRSJqIb94WbnhXhVnvSzCWqyk36_DXfQ2mQ7i7aMc4Drmvqe6gkxvtoS5cTnR-gTh0b9lizx4Y9ztnjaYyBPf7MHscSr7HHiv0herppdRtt21TMsBPPiaY2dQWhUUi4iGgkqYhkAHI95BSceAZPHXP6zKPC5b4P72gSgJxjDgWPl4Fq9jnxjtDGOB6LE4Rl5PCqwznzJCF9L6ARA-0IzqhkVVmT3ik61reil-i0KL38Jp1923OOtpeT4QJtSnjmxCUqp3x2pRh8AIuoUws
link.rule.ids 782
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+support+vector+machine+approach+to+the+identification+of+phosphorylation+sites&rft.jtitle=Cellular+%26+molecular+biology+letters&rft.au=Plewczy%C5%84ski%2C+Dariusz&rft.au=Tkacz%2C+Adrian&rft.au=Godzik%2C+Adam&rft.au=Rychlewski%2C+Leszek&rft.date=2005-01-01&rft.issn=1425-8153&rft.volume=10&rft.issue=1&rft.spage=73&rft_id=info%3Apmid%2F15809681&rft_id=info%3Apmid%2F15809681&rft.externalDocID=15809681
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1425-8153&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1425-8153&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1425-8153&client=summon