Turn-taking cues in task-oriented dialogue

▶ Seven turn-yielding cues precede turn changes in spontaneous task-oriented dialogue. ▶ Cues are prosodic, acoustic, and lexico-syntactic events. ▶ Cues are linearly correlated with the occurrence of turn-taking attempts. ▶ Six backchannel-inviting cues precede the occurrence of a backchannel. ▶ Re...

Full description

Saved in:
Bibliographic Details
Published in:Computer speech & language Vol. 25; no. 3; pp. 601 - 634
Main Authors: Gravano, Agustín, Hirschberg, Julia
Format: Journal Article
Language:English
Published: Kidlington Elsevier Ltd 01-07-2011
Elsevier
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:▶ Seven turn-yielding cues precede turn changes in spontaneous task-oriented dialogue. ▶ Cues are prosodic, acoustic, and lexico-syntactic events. ▶ Cues are linearly correlated with the occurrence of turn-taking attempts. ▶ Six backchannel-inviting cues precede the occurrence of a backchannel. ▶ Results will be useful for turn management in future IVR systems. As interactive voice response systems become more prevalent and provide increasingly more complex functionality, it becomes clear that the challenges facing such systems are not solely in their synthesis and recognition capabilities. Issues such as the coordination of turn exchanges between system and user also play an important role in system usability. In particular, both systems and users have difficulty determining when the other is taking or relinquishing the turn. In this paper, we seek to identify turn-taking cues correlated with human–human turn exchanges which are automatically computable. We compare the presence of potential prosodic, acoustic, and lexico-syntactic turn-yielding cues in prosodic phrases preceding turn changes ( smooth switches) vs. turn retentions ( holds) vs. backchannels in the Columbia Games Corpus, a large corpus of task-oriented dialogues, to determine which features reliably distinguish between these three. We identify seven turn-yielding cues, all of which can be extracted automatically, for future use in turn generation and recognition in interactive voice response (IVR) systems. Testing Duncan’s (1972) hypothesis that these turn-yielding cues are linearly correlated with the occurrence of turn-taking attempts, we further demonstrate that, the greater the number of turn-yielding cues that are present, the greater the likelihood that a turn change will occur. We also identify six cues that precede backchannels, which will also be useful for IVR backchannel generation and recognition; these cues correlate with backchannel occurrence in a quadratic manner. We find similar results for overlapping and for non-overlapping speech.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:0885-2308
1095-8363
DOI:10.1016/j.csl.2010.10.003