Human or Machine: Automating Human Likeliness Evaluation of NLG Texts

Automatic evaluation of various text quality criteria produced by data-driven intelligent methods is very common and useful because it is cheap, fast, and usually yields repeatable results. In this paper, we present an attempt to automate the human likeliness evaluation of the output text samples co...

Full description

Saved in:
Bibliographic Details
Main Authors: Çano, Erion, Bojar, Ondřej
Format: Journal Article
Language:English
Published: 04-06-2020
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automatic evaluation of various text quality criteria produced by data-driven intelligent methods is very common and useful because it is cheap, fast, and usually yields repeatable results. In this paper, we present an attempt to automate the human likeliness evaluation of the output text samples coming from natural language generation methods used to solve several tasks. We propose to use a human likeliness score that shows the percentage of the output samples from a method that look as if they were written by a human. Instead of having human participants label or rate those samples, we completely automate the process by using a discrimination procedure based on large pretrained language models and their probability distributions. As follow up, we plan to perform an empirical analysis of human-written and machine-generated texts to find the optimal setup of this evaluation approach. A validation procedure involving human participants will also check how the automatic evaluation correlates with human judgments.
DOI:10.48550/arxiv.2006.03189