Feature Selection for Imbalanced Data with Deep Sparse Autoencoders Ensemble
Class imbalance is a common issue in many domain applications of learning algorithms. Oftentimes, in the same domains it is much more relevant to correctly classify and profile minority class observations. This need can be addressed by Feature Selection (FS), that offers several further advantages,...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
22-03-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Class imbalance is a common issue in many domain applications of learning
algorithms. Oftentimes, in the same domains it is much more relevant to
correctly classify and profile minority class observations. This need can be
addressed by Feature Selection (FS), that offers several further advantages,
s.a. decreasing computational costs, aiding inference and interpretability.
However, traditional FS techniques may become sub-optimal in the presence of
strongly imbalanced data. To achieve FS advantages in this setting, we propose
a filtering FS algorithm ranking feature importance on the basis of the
Reconstruction Error of a Deep Sparse AutoEncoders Ensemble (DSAEE). We use
each DSAE trained only on majority class to reconstruct both classes. From the
analysis of the aggregated Reconstruction Error, we determine the features
where the minority class presents a different distribution of values w.r.t. the
overrepresented one, thus identifying the most relevant features to
discriminate between the two. We empirically demonstrate the efficacy of our
algorithm in several experiments on high-dimensional datasets of varying sample
size, showcasing its capability to select relevant and generalizable features
to profile and classify minority class, outperforming other benchmark FS
methods. We also briefly present a real application in radiogenomics, where the
methodology was applied successfully. |
---|---|
DOI: | 10.48550/arxiv.2103.11678 |