Dimension Reduction via Unsupervised Learning Yields Significant Computational Improvements for Support Vector Machine Based Protein Family Classification

Reducing the dimension of vectors used in training support vector machines (SVMs) results in a proportional speedup in training time. For large-scale problems this can make the difference between tractable and intractable training tasks. However, it is critical that classifiers trained on reduced da...

Full description

Saved in:

Bibliographic Details
Published in:	2008 Seventh International Conference on Machine Learning and Applications pp. 457 - 462
Main Authors:	Webb-Robertson, B.-J.M., Matzke, M.M., Oehmen, C.S.
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-12-2008
Subjects:	Bioinformatics dimension reduction Floods Genomics machine leraning Principal component analysis protein homology detection Proteins Runtime Sequences support vector machine Support vector machine classification Support vector machines Unsupervised learning
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Reducing the dimension of vectors used in training support vector machines (SVMs) results in a proportional speedup in training time. For large-scale problems this can make the difference between tractable and intractable training tasks. However, it is critical that classifiers trained on reduced datasets perform as reliably as their counterparts trained on high-dimensional data. We assessed principal component analysis (PCA) and sequential project pursuit (SPP) as dimension reduction strategies in the biology application of classifying proteins into well-defined functional dasiafamiliespsila (SVM-based protein family classification) by their impact on run-time, sensitivity and selectivity. Homology vectors of 4352 elements were reduced to approximately 2% of the original data size using PCA and SPP without significantly affecting accuracy, while leading to approximately a 28-fold speedup in run-time.
ISBN:	0769534953 9780769534954
DOI:	10.1109/ICMLA.2008.120