Monte Carlo feature selection for supervised classification

Motivation: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics Vol. 24; no. 1; pp. 110 - 117
Main Authors: Dramiński, Michał, Rada-Iglesias, Alvaro, Enroth, Stefan, Wadelius, Claes, Koronacki, Jacek, Komorowski, Jan
Format: Journal Article
Language:English
Published: Oxford Oxford University Press 01-01-2008
Oxford Publishing Limited (England)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Motivation: Pre-selection of informative features for supervised classification is a crucial, albeit delicate, task. It is desirable that feature selection provides the features that contribute most to the classification task per se and which should therefore be used by any classifier later used to produce classification rules. In this article, a conceptually simple but computer-intensive approach to this task is proposed. The reliability of the approach rests on multiple construction of a tree classifier for many training sets randomly chosen from the original sample set, where samples in each training set consist of only a fraction of all of the observed features. Results: The resulting ranking of features may then be used to advantage for classification via a classifier of any type. The approach was validated using Golub et al. leukemia data and the Alizadeh et al. lymphoma data. Not surprisingly, we obtained a significantly different list of genes. Biological interpretation of the genes selected by our method showed that several of them are involved in precursors to different types of leukemia and lymphoma rather than being genes that are common to several forms of cancers, which is the case for the other methods. Availability: Prototype available upon request. Contact: jan.komorowski@lcb.uu.se
Bibliography:istex:E92A3B3889B435B3D3753360BCF47651E754743A
To whom correspondence should be addressed.
ark:/67375/HXZ-4NP8HCTL-H
Associate Editor: Joaquin Dopazo
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
ISSN:1367-4803
1367-4811
1460-2059
1367-4811
DOI:10.1093/bioinformatics/btm486