Subsequence Feature Maps for Protein Function Annotation

With the advances in sequencing technologies, the number of protein sequences with unknown function increases rapidly. Hence, computational methods for functional annotation of these protein sequences become of the upmost importance. In this thesis, we first defined a feature space mapping of protei...

Full description

Saved in:
Bibliographic Details
Main Author: Saraç, Ömer Sinan
Format: Dissertation
Language:English
Published: ProQuest Dissertations & Theses 01-01-2008
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the advances in sequencing technologies, the number of protein sequences with unknown function increases rapidly. Hence, computational methods for functional annotation of these protein sequences become of the upmost importance. In this thesis, we first defined a feature space mapping of protein primary sequences to fixed dimensional numerical vectors. This mapping, which is called the Subsequence Profile Map (SPMap), takes into account the models of the subsequences of protein sequences. The resulting vectors were used as an input to support vector machines (SVM) for functional classification of proteins. Second, we defined the protein functional annotation problem as a classification problem and construct a classification framework defined on Gene Ontology (GO) terms. Di erent classification methods as well as their combinations are assessed on this framework which is based on 300 GO molecular function terms. The reiv sults showed that combination enhances the classification accuracy. The resultant system is made publicly available as an online function annotation tool.
ISBN:9798342517805