Improve generated adversarial imitation learning with reward variance regularization
Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensit...
Saved in:
Published in: | Machine learning Vol. 111; no. 3; pp. 977 - 995 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
New York
Springer US
01-03-2022
Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensitive to hyperparameters and hard to converge well in practice. One key issue is that the supervised learning discriminator has a much faster learning speed than the reinforcement learning generator, making the generator gradient vanishing. Although GAIL is formulated as a zero-sum adversarial game, the ultimate goal of GAIL is to learn the generator, thus the discriminator should play the role more like a teacher rather than a real opponent. Therefore, the learning of the discriminator should consider how the generator could learn. In this paper, we disclose that enhancing the gradient of the generator training is equivalent to increase the variance of the fake reward provided by the discriminator output. We thus propose an improved version of GAIL, GAIL-VR, in which the discriminator also learns to avoid generator gradient vanishing through regularization of the fake rewards variance. Experiments in various tasks, including locomotion tasks and Atari games, indicate that GAIL-VR can improve the training stability and imitation scores. |
---|---|
ISSN: | 0885-6125 1573-0565 |
DOI: | 10.1007/s10994-021-06083-7 |