Acoustic-phonetic properties of Siri- and human-directed speech

•Measured acoustic properties of speech produced to Siri vs. a human.•Siri-DS tends to be louder, has a lower mean f0, and a smaller f0 range.•Targeted vowel hyperarticulation in Siri-DS in coda repairs.•“Presumed” and “actual” communicative barriers affect Siri-DS.•Supports listener-intelligibility...

Full description

Saved in:
Bibliographic Details
Published in:Journal of phonetics Vol. 90; p. 101123
Main Authors: Cohn, Michelle, Segedin, Bruno Ferenc, Zellou, Georgia
Format: Journal Article
Language:English
Published: Elsevier Ltd 01-01-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Measured acoustic properties of speech produced to Siri vs. a human.•Siri-DS tends to be louder, has a lower mean f0, and a smaller f0 range.•Targeted vowel hyperarticulation in Siri-DS in coda repairs.•“Presumed” and “actual” communicative barriers affect Siri-DS.•Supports listener-intelligibility accounts, counters technology equivalence theories. Millions of people engage in spoken interactions with voice activated artificially intelligent (voice-AI) systems in their everyday lives. This study explores whether speakers have a voice-AI-specific register, relative to their speech toward an adult human. Furthermore, this study tests if speakers have targeted error correction strategies for voice-AI and human interlocutors. In a pseudo-interactive task with pre-recorded Siri and human voices, participants produced target words in sentences. In each turn, following an initial production and feedback from the interlocutor, participants repeated the sentence in one of three response types: after correct word identification, a coda error, or a vowel error made by the interlocutor. Across two studies, the rate of comprehension errors made by both interlocutors was varied (lower vs. higher error rate). Register differences are found: participants speak louder, with a lower mean f0, and with a smaller f0 range in Siri-DS. Many differences in Siri-DS emerged as dynamic adjustments over the course of the interaction. Additionally, error rate shapes how register differences are realized. One targeted error correction was observed: speakers produce more vowel hyperarticulation in coda repairs in Siri-DS. Taken together, these findings contribute to our understanding of speech register and the dynamic nature of talker-interlocutor interactions.
ISSN:0095-4470
1095-8576
DOI:10.1016/j.wocn.2021.101123