Improving nature-inspired algorithms for feature selection

Selecting highly discriminative features from a whole feature set has become an important research area. Not only can this improve the performance of classification, but it can also decrease the cost of system diagnoses when a large number of noisy, redundant features are excluded. Binary nature-ins...

Full description

Saved in:
Bibliographic Details
Published in:Journal of ambient intelligence and humanized computing Vol. 13; no. 6; pp. 3025 - 3035
Main Authors: Al-Thanoon, Niam Abdulmunim, Qasim, Omar Saber, Algamal, Zakariya Yahya
Format: Journal Article
Language:English
Published: Berlin/Heidelberg Springer Berlin Heidelberg 01-06-2022
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Selecting highly discriminative features from a whole feature set has become an important research area. Not only can this improve the performance of classification, but it can also decrease the cost of system diagnoses when a large number of noisy, redundant features are excluded. Binary nature-inspired algorithms have been used as a feature selection procedure. Each of these algorithms requires an initial population to be set, and the appropriateness of the initialization plays a key role in the final result. At the stage of population initialization, the positions are initialized randomly by uniform distribution which leads to a high variability of the classification results. To avoid the randomness of the population generated and to take into account the relation between each feature and the class variable, parametric and non-parametric methods, such as the t-test and Wilcoxon rank sum test are proposed as an initial population in the binary nature-inspired algorithms. This modification can help these binary algorithms to enhance global exploration and local exploitation or exhibit a slow convergence speed compared with the standard procedure. The binary bat, gray wolf, and whale algorithms are considered. The performance of our proposed methods is evaluated on ten publicly available datasets with high-dimensional and low-dimensional data. The experimental results and statistical analysis confirm that the performance of our proposed methods compared with the standard algorithms is better in terms of classification accuracy, the number of selected features, running time, and feature selection stability.
ISSN:1868-5137
1868-5145
DOI:10.1007/s12652-021-03136-6