Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments

As machine learning models evolve, maintaining transparency demands more human-centric explainable AI techniques. Counterfactual explanations, with roots in human reasoning, identify the minimal input changes needed to obtain a given output and, hence, are crucial for supporting decision-making. Des...

Full description

Saved in:
Bibliographic Details
Main Authors: Domnich, Marharyta, Valja, Julius, Veski, Rasmus Moorits, Magnifico, Giacomo, Tulver, Kadi, Barbu, Eduard, Vicente, Raul
Format: Journal Article
Language:English
Published: 28-10-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract As machine learning models evolve, maintaining transparency demands more human-centric explainable AI techniques. Counterfactual explanations, with roots in human reasoning, identify the minimal input changes needed to obtain a given output and, hence, are crucial for supporting decision-making. Despite their importance, the evaluation of these explanations often lacks grounding in user studies and remains fragmented, with existing metrics not fully capturing human perspectives. To address this challenge, we developed a diverse set of 30 counterfactual scenarios and collected ratings across 8 evaluation metrics from 206 respondents. Subsequently, we fine-tuned different Large Language Models (LLMs) to predict average or individual human judgment across these metrics. Our methodology allowed LLMs to achieve an accuracy of up to 63% in zero-shot evaluations and 85% (over a 3-classes prediction) with fine-tuning across all metrics. The fine-tuned models predicting human ratings offer better comparability and scalability in evaluating different counterfactual explanation frameworks.
AbstractList As machine learning models evolve, maintaining transparency demands more human-centric explainable AI techniques. Counterfactual explanations, with roots in human reasoning, identify the minimal input changes needed to obtain a given output and, hence, are crucial for supporting decision-making. Despite their importance, the evaluation of these explanations often lacks grounding in user studies and remains fragmented, with existing metrics not fully capturing human perspectives. To address this challenge, we developed a diverse set of 30 counterfactual scenarios and collected ratings across 8 evaluation metrics from 206 respondents. Subsequently, we fine-tuned different Large Language Models (LLMs) to predict average or individual human judgment across these metrics. Our methodology allowed LLMs to achieve an accuracy of up to 63% in zero-shot evaluations and 85% (over a 3-classes prediction) with fine-tuning across all metrics. The fine-tuned models predicting human ratings offer better comparability and scalability in evaluating different counterfactual explanation frameworks.
Author Magnifico, Giacomo
Barbu, Eduard
Vicente, Raul
Veski, Rasmus Moorits
Tulver, Kadi
Valja, Julius
Domnich, Marharyta
Author_xml – sequence: 1
  givenname: Marharyta
  surname: Domnich
  fullname: Domnich, Marharyta
– sequence: 2
  givenname: Julius
  surname: Valja
  fullname: Valja, Julius
– sequence: 3
  givenname: Rasmus Moorits
  surname: Veski
  fullname: Veski, Rasmus Moorits
– sequence: 4
  givenname: Giacomo
  surname: Magnifico
  fullname: Magnifico, Giacomo
– sequence: 5
  givenname: Kadi
  surname: Tulver
  fullname: Tulver, Kadi
– sequence: 6
  givenname: Eduard
  surname: Barbu
  fullname: Barbu, Eduard
– sequence: 7
  givenname: Raul
  surname: Vicente
  fullname: Vicente, Raul
BackLink https://doi.org/10.48550/arXiv.2410.21131$$DView paper in arXiv
BookMark eNqFjr2OwkAMhLc4Co7jAajwC8AREiR03SkKouA6ro6s4I1W2niRnQ0_T0-I6Glsz3hG-j7NBwcmY2bJapltN5vVN8rVdct11hvrJEmTsbkfwwXlpPDPzt4c11B06CO2LjAEC3mI3JJYrNqIHorr2SMPX_2BA3UkWD9bB5Sa-sl1xP74CyfyCjYI7GODvMiJW3EV_KqSatMr_TIji15p-toTM98Vx3y_GCjLs7gG5VY-acuBNn2feABjfU5o
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID AKY
GOX
DOI 10.48550/arxiv.2410.21131
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2410_21131
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2410_211313
IEDL.DBID GOX
IngestDate Wed Oct 30 12:12:14 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2410_211313
OpenAccessLink https://arxiv.org/abs/2410.21131
ParticipantIDs arxiv_primary_2410_21131
PublicationCentury 2000
PublicationDate 2024-10-28
PublicationDateYYYYMMDD 2024-10-28
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-10-28
  day: 28
PublicationDecade 2020
PublicationYear 2024
Score 3.8807008
SecondaryResourceType preprint
Snippet As machine learning models evolve, maintaining transparency demands more human-centric explainable AI techniques. Counterfactual explanations, with roots in...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Title Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments
URI https://arxiv.org/abs/2410.21131
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwED3RTiwIBKh838BqSPPllK2ClA4VDHToFtmJLVVCKapphfj1nM-lsHTJ4FiWnUR5z-d77wBuZaZ1k6ZSqCSLRJo2uRjoOBN5o7O-ri2RBi8UHr_Jl1nxVHqbHPzVwqjl13wd_IG1uyd4ie5oi-KF0p049ilbz6-zcDjJVlyb_n_9iGNy0z-QGB3CwYbd4TC8jiPYM-0xfE85NdUh8TtWFWG5tdjGhUUvC_fFohVrOdBnxakQo3MPODH0rXElIZz4pG26hgAj-ipm7w6JdCJH4gUHauc1Drdmm-4Ebkbl9HEseLbVR7CWqPxCKl5IcgrddtGaHmBmVUFER1uZ9NPaNoM6zwmHTaQs4Y-RZ9DbNcr57lsXsB8TQPv_cFxcQvdzuTJX0HHN6pqf8g9l8oIN
link.rule.ids 228,230,782,887
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Towards+Unifying+Evaluation+of+Counterfactual+Explanations%3A+Leveraging+Large+Language+Models+for+Human-Centric+Assessments&rft.au=Domnich%2C+Marharyta&rft.au=Valja%2C+Julius&rft.au=Veski%2C+Rasmus+Moorits&rft.au=Magnifico%2C+Giacomo&rft.date=2024-10-28&rft_id=info:doi/10.48550%2Farxiv.2410.21131&rft.externalDocID=2410_21131