A data-science pipeline to enable the Interpretability of Many-Objective Feature Selection
Many-Objective Feature Selection (MOFS) approaches use four or more objectives to determine the relevance of a subset of features in a supervised learning task. As a consequence, MOFS typically returns a large set of non-dominated solutions, which have to be assessed by the data scientist in order t...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
30-11-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Many-Objective Feature Selection (MOFS) approaches use four or more
objectives to determine the relevance of a subset of features in a supervised
learning task. As a consequence, MOFS typically returns a large set of
non-dominated solutions, which have to be assessed by the data scientist in
order to proceed with the final choice. Given the multi-variate nature of the
assessment, which may include criteria (e.g. fairness) not related to
predictive accuracy, this step is often not straightforward and suffers from
the lack of existing tools. For instance, it is common to make use of a tabular
presentation of the solutions, which provide little information about the
trade-offs and the relations between criteria over the set of solutions.
This paper proposes an original methodology to support data scientists in the
interpretation and comparison of the MOFS outcome by combining post-processing
and visualisation of the set of solutions. The methodology supports the data
scientist in the selection of an optimal feature subset by providing her with
high-level information at three different levels: objectives, solutions, and
individual features.
The methodology is experimentally assessed on two feature selection tasks
adopting a GA-based MOFS with six objectives (number of selected features,
balanced accuracy, F1-Score, variance inflation factor, statistical parity, and
equalised odds). The results show the added value of the methodology in the
selection of the final subset of features. |
---|---|
DOI: | 10.48550/arxiv.2311.18746 |