In-context Reinforcement Learning with Algorithm Distillation

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction prob...

Full description

Saved in:
Bibliographic Details
Main Authors: Laskin, Michael, Wang, Luyu, Oh, Junhyuk, Parisotto, Emilio, Spencer, Stephen, Steigerwald, Richie, Strouse, DJ, Hansen, Steven, Filos, Angelos, Brooks, Ethan, Gazeau, Maxime, Sahni, Himanshu, Singh, Satinder, Mnih, Volodymyr
Format: Journal Article
Language:English
Published: 25-10-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.
DOI:10.48550/arxiv.2210.14215