In-context Reinforcement Learning with Algorithm Distillation
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction prob...
Saved in:
Main Authors: | , , , , , , , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
25-10-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We propose Algorithm Distillation (AD), a method for distilling reinforcement
learning (RL) algorithms into neural networks by modeling their training
histories with a causal sequence model. Algorithm Distillation treats learning
to reinforcement learn as an across-episode sequential prediction problem. A
dataset of learning histories is generated by a source RL algorithm, and then a
causal transformer is trained by autoregressively predicting actions given
their preceding learning histories as context. Unlike sequential policy
prediction architectures that distill post-learning or expert sequences, AD is
able to improve its policy entirely in-context without updating its network
parameters. We demonstrate that AD can reinforcement learn in-context in a
variety of environments with sparse rewards, combinatorial task structure, and
pixel-based observations, and find that AD learns a more data-efficient RL
algorithm than the one that generated the source data. |
---|---|
DOI: | 10.48550/arxiv.2210.14215 |