A Novel Feature Selection Method for High-Dimensional Mixed Decision Tables

Attribute reduction, also called feature selection, is one of the most important issues of rough set theory, which is regarded as a vital preprocessing step in pattern recognition, machine learning, and data mining. Nowadays, high-dimensional mixed and incomplete data sets are very common in real-wo...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transaction on neural networks and learning systems Vol. PP; no. 7; pp. 1 - 14
Main Authors: Thuy, Nguyen Ngoc, Wongthanavasu, Sartra
Format: Journal Article
Language:English
Published: United States IEEE 01-07-2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Attribute reduction, also called feature selection, is one of the most important issues of rough set theory, which is regarded as a vital preprocessing step in pattern recognition, machine learning, and data mining. Nowadays, high-dimensional mixed and incomplete data sets are very common in real-world applications. Certainly, the selection of a promising feature subset from such data sets is a very interesting, but challenging problem. Almost all of the existing methods generated a cover on the space of objects to determine important features. However, some tolerance classes in the cover are useless for the computational process. Thus, this article introduces a new concept of stripped neighborhood covers to reduce unnecessary tolerance classes from the original cover. Based on the proposed stripped neighborhood cover, we define a new reduct in mixed and incomplete decision tables, and then design an efficient heuristic algorithm to find this reduct. For each loop in the main loop of the proposed algorithm, we use an error measure to select an optimal feature and put it into the selected feature subset. Besides, to deal more efficiently with high-dimensional data sets, we also determine redundant features after each loop and remove them from the candidate feature subset. For the purpose of verifying the performance of the proposed algorithm, we carry out experiments on data sets downloaded from public data sources to compare with existing state-of-the-art algorithms. Experimental results showed that our algorithm outperforms compared algorithms, especially in classification accuracy.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2020.3048080