Exophora Resolution of Linguistic Instructions with a Demonstrative based on Real-World Multimodal Information

To enable a robot to provide support in a home environment through human-robot interaction, exophora resolution is crucial for accurately identifying the target of ambiguous linguistic instructions, which may include a demonstrative, such as "Take that one". Unlike endophora resolution, wh...

Full description

Saved in:
Bibliographic Details
Published in:2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) pp. 2617 - 2623
Main Authors: Oyama, Akira, Hasegawa, Shoichi, Nakagawa, Hikaru, Taniguchi, Akira, Hagiwara, Yoshinobu, Taniguchi, Tadahiro
Format: Conference Proceeding
Language:English
Published: IEEE 28-08-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To enable a robot to provide support in a home environment through human-robot interaction, exophora resolution is crucial for accurately identifying the target of ambiguous linguistic instructions, which may include a demonstrative, such as "Take that one". Unlike endophora resolution, which involves predicting the corresponding word from given sentences, exophora resolution necessitates comprehensive utilization of external real-world information to identify and disambiguate the target from the on-site environment. This study aims to resolve ambiguity in language instructions containing a demonstrative through exophora resolution, utilizing real-world multimodal information. The robot accomplishes this by using three types of information: 1) object categories, 2) demonstratives, and 3) pointing, as well as knowledge about objects obtained from the robot's pre-exploration of the environment. We evaluated the accuracy of object identification under multiple conditions by identifying a user-indicated object in a field that mimics a home environment. Our results demonstrate that our proposed method of exophora resolution using multimodal information can identify the target with two to three times higher accuracy than baseline methods in cases where information is missing.
ISSN:1944-9437
DOI:10.1109/RO-MAN57019.2023.10309487