Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry
The release of openly available, robust natural language generation algorithms (NLG) has spurred much public attention and debate. One reason lies in the algorithms' purported ability to generate humanlike text across various domains. Empirical evidence using incentivized tasks to assess whethe...
Saved in:
Published in: | Computers in human behavior Vol. 114; p. 106553 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
Elmsford
Elsevier Ltd
01-01-2021
Elsevier Science Ltd |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The release of openly available, robust natural language generation algorithms (NLG) has spurred much public attention and debate. One reason lies in the algorithms' purported ability to generate humanlike text across various domains. Empirical evidence using incentivized tasks to assess whether people (a) can distinguish and (b) prefer algorithm-generated versus human-written text is lacking. We conducted two experiments assessing behavioral reactions to the state-of-the-art Natural Language Generation algorithm GPT-2 (Ntotal = 830). Using the identical starting lines of human poems, GPT-2 produced samples of poems. From these samples, either a random poem was chosen (Human-out-of-theloop) or the best one was selected (Human-in-the-loop) and in turn matched with a human-written poem. In a new incentivized version of the Turing Test, participants failed to reliably detect the algorithmicallygenerated poems in the Human-in-the-loop treatment, yet succeeded in the Human-out-of-the-loop treatment. Further, people reveal a slight aversion to algorithm-generated poetry, independent on whether participants were informed about the algorithmic origin of the poem (Transparency) or not (Opacity). We discuss what these results convey about the performance of NLG algorithms to produce human-like text and propose methodologies to study such learning algorithms in human-agent experimental settings.
•New natural language generation (NLG) algorithms, like GPT-2, allegedly generate human-like text across diverse domains.•We conducted two experiments (total N = 830) on human and machine behavior in the creative writing domain.•Outputs by GPT-2 were either selected or randomly paired with human poems by novices (Study 1) or professionals (Study 2).•In an incentivized version of the Turing Test, participants failed to reliably detect selected algorithmic creative text.•People are averse to AI-generated poetry, independent on receiving information about the origin of the poem, or not. |
---|---|
ISSN: | 0747-5632 1873-7692 |
DOI: | 10.1016/j.chb.2020.106553 |