Fr\'echet Distance for Offline Evaluation of Information Retrieval Systems with Sparse Labels
The rapid advancement of natural language processing, information retrieval (IR), computer vision, and other technologies has presented significant challenges in evaluating the performance of these systems. One of the main challenges is the scarcity of human-labeled data, which hinders the fair and...
Saved in:
Main Authors: | , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
30-01-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The rapid advancement of natural language processing, information retrieval
(IR), computer vision, and other technologies has presented significant
challenges in evaluating the performance of these systems. One of the main
challenges is the scarcity of human-labeled data, which hinders the fair and
accurate assessment of these systems. In this work, we specifically focus on
evaluating IR systems with sparse labels, borrowing from recent research on
evaluating computer vision tasks. taking inspiration from the success of using
Fr\'echet Inception Distance (FID) in assessing text-to-image generation
systems. We propose leveraging the Fr\'echet Distance to measure the distance
between the distributions of relevant judged items and retrieved results. Our
experimental results on MS MARCO V1 dataset and TREC Deep Learning Tracks query
sets demonstrate the effectiveness of the Fr\'echet Distance as a metric for
evaluating IR systems, particularly in settings where a few labels are
available. This approach contributes to the advancement of evaluation
methodologies in real-world scenarios such as the assessment of generative IR
systems. |
---|---|
DOI: | 10.48550/arxiv.2401.17543 |