Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments
As machine learning models evolve, maintaining transparency demands more human-centric explainable AI techniques. Counterfactual explanations, with roots in human reasoning, identify the minimal input changes needed to obtain a given output and, hence, are crucial for supporting decision-making. Des...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
28-10-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | As machine learning models evolve, maintaining transparency demands more
human-centric explainable AI techniques. Counterfactual explanations, with
roots in human reasoning, identify the minimal input changes needed to obtain a
given output and, hence, are crucial for supporting decision-making. Despite
their importance, the evaluation of these explanations often lacks grounding in
user studies and remains fragmented, with existing metrics not fully capturing
human perspectives. To address this challenge, we developed a diverse set of 30
counterfactual scenarios and collected ratings across 8 evaluation metrics from
206 respondents. Subsequently, we fine-tuned different Large Language Models
(LLMs) to predict average or individual human judgment across these metrics.
Our methodology allowed LLMs to achieve an accuracy of up to 63% in zero-shot
evaluations and 85% (over a 3-classes prediction) with fine-tuning across all
metrics. The fine-tuned models predicting human ratings offer better
comparability and scalability in evaluating different counterfactual
explanation frameworks. |
---|---|
DOI: | 10.48550/arxiv.2410.21131 |