Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation

Rating-based human evaluation has become an essential tool to accurately evaluate the impressive performance of large language models (LLMs). However, current rating systems suffer from several important limitations: first, they fail to account for biases that significantly influence evaluation resu...

Full description

Saved in:
Bibliographic Details
Main Authors: Dekoninck, Jasper, Baader, Maximilian, Vechev, Martin
Format: Journal Article
Language:English
Published: 01-09-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first