Reliability Evaluation of Individual Predictions: A Data-centric Approach
Machine learning models only provide probabilistic guarantees on the expected loss of random samples from the distribution represented by their training data. As a result, a model with high accuracy, may or may not be reliable for predicting an individual query point. To address this issue, XAI aims...
Saved in:
Main Authors: | , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
15-04-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Machine learning models only provide probabilistic guarantees on the expected
loss of random samples from the distribution represented by their training
data. As a result, a model with high accuracy, may or may not be reliable for
predicting an individual query point. To address this issue, XAI aims to
provide explanations of individual predictions, while approaches such as
conformal predictions, probabilistic predictions, and prediction intervals
count on the model's certainty in its prediction to identify unreliable cases.
Conversely, instead of relying on the model itself, we look for insights in
the training data. That is, following the fact a model's performance is limited
to the data it has been trained on, we ask "is a model trained on a given data
set, fit for making a specific prediction?". Specifically, we argue that a
model's prediction is not reliable if (i) there were not enough similar
instances in the training set to the query point, and (ii) if there is a high
fluctuation (uncertainty) in the vicinity of the query point in the training
set. Using these two observations, we propose data-centric reliability measures
for individual predictions and develop novel algorithms for efficient and
effective computation of the reliability measures during inference time. The
proposed algorithms learn the necessary components of the measures from the
data itself and are sublinear, which makes them scalable to very large and
multi-dimensional settings. Furthermore, an estimator is designed to enable
no-data access during the inference time. We conduct extensive experiments
using multiple real and synthetic data sets and different tasks, which reflect
a consistent correlation between distrust values and model performance. |
---|---|
DOI: | 10.48550/arxiv.2204.07682 |