A Self-Tuning Actor-Critic Algorithm
Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain. In this paper, we take a step towards addressing this issue by using metagradients to automatically ad...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
28-02-2020
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Reinforcement learning algorithms are highly sensitive to the choice of
hyperparameters, typically requiring significant manual effort to identify
hyperparameters that perform well on a new domain. In this paper, we take a
step towards addressing this issue by using metagradients to automatically
adapt hyperparameters online by meta-gradient descent (Xu et al., 2018). We
apply our algorithm, Self-Tuning Actor-Critic (STAC), to self-tune all the
differentiable hyperparameters of an actor-critic loss function, to discover
auxiliary tasks, and to improve off-policy learning using a novel leaky V-trace
operator. STAC is simple to use, sample efficient and does not require a
significant increase in compute. Ablative studies show that the overall
performance of STAC improved as we adapt more hyperparameters. When applied to
the Arcade Learning Environment (Bellemare et al. 2012), STAC improved the
median human normalized score in 200M steps from 243% to 364%. When applied to
the DM Control suite (Tassa et al., 2018), STAC improved the mean score in 30M
steps from 217 to 389 when learning with features, from 108 to 202 when
learning from pixels, and from 195 to 295 in the Real-World Reinforcement
Learning Challenge (Dulac-Arnold et al., 2020). |
---|---|
DOI: | 10.48550/arxiv.2002.12928 |