Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation

Rating-based human evaluation has become an essential tool to accurately evaluate the impressive performance of large language models (LLMs). However, current rating systems suffer from several important limitations: first, they fail to account for biases that significantly influence evaluation resu...

Full description

Saved in:

Bibliographic Details
Main Authors:	Dekoninck, Jasper, Baader, Maximilian, Vechev, Martin
Format:	Journal Article
Language:	English
Published:	01-09-2024
Subjects:	Computer Science - Artificial Intelligence Computer Science - Computation and Language
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!