Novel Methods for Efficient Dialogue Policy Learning by Improving Agent-User Interaction
Building a dialogue system to converse with humans for different tasks remains one of the challenges for Natural Language Processing. In this area, a critical task is dialogue management (DM) which selects a collection of actions to fulfill users information seeking naturally and rapidly.Rule-based...
Saved in:
Main Author: | |
---|---|
Format: | Dissertation |
Language: | English |
Published: |
ProQuest Dissertations & Theses
01-01-2019
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Building a dialogue system to converse with humans for different tasks remains one of the challenges for Natural Language Processing. In this area, a critical task is dialogue management (DM) which selects a collection of actions to fulfill users information seeking naturally and rapidly.Rule-based DM approaches are inflexible and hard to apply to complex scenarios. Recent advances in deep reinforcement learning (RL) make it attractive for DM. RL is characterized by more robust to errors and enabling dialogue system to automatically learn to respond optimally. However, because of the sparsity of reward signal and large search space, training dialogue agents via RL usually requires a large number of agent-user interactions to achieve good performance. Though user simulators can be utilized to alleviate this problem, it usually lacks the conversational complexity of human users. In this thesis, we propose three novel methods to mitigate these issues from agent and user perspectives.From the agent perspective, firstly, Adversarial Advantage Actor-Critic (A3C) model is proposed to recover a meaningful reward function from a human demonstration corpus to complement manually defined reward functions. A3C, however, is inefficient for handling complex domains which include a set of sub-tasks to achieve collectively. Secondly, Hierarchical RL combined with RL and hierarchical task decomposition is proposed to decompose a complex dialogue into multiple granularities, and at each time the agent only focuses on a smaller and easier task. It is shown that both A3C and Hierarchical RL can learn a pragmatic policy efficiently in real-world applications.From the user perspective, a model-based RL algorithm termed as Deep Dyna-Q (DDQ) is proposed to incorporate a world model into the dialogue agent to mimic real user responses and provide simulated experiences. It is demonstrated that DDQ enables a dialogue agent to learn with human users in a much more efficient way. |
---|---|
ISBN: | 9781392678367 1392678366 |