Characterizing instance hardness in classification and regression problems
Some recent pieces of work in the Machine Learning (ML) literature have demonstrated the usefulness of assessing which observations are hardest to have their label predicted accurately. By identifying such instances, one may inspect whether they have any quality issues that should be addressed. Lear...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
04-12-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Some recent pieces of work in the Machine Learning (ML) literature have
demonstrated the usefulness of assessing which observations are hardest to have
their label predicted accurately. By identifying such instances, one may
inspect whether they have any quality issues that should be addressed. Learning
strategies based on the difficulty level of the observations can also be
devised. This paper presents a set of meta-features that aim at characterizing
which instances of a dataset are hardest to have their label predicted
accurately and why they are so, aka instance hardness measures. Both
classification and regression problems are considered. Synthetic datasets with
different levels of complexity are built and analyzed. A Python package
containing all implementations is also provided. |
---|---|
DOI: | 10.48550/arxiv.2212.01897 |