Reasoning with Latent Diffusion in Offline Reinforcement Learning
Offline reinforcement learning (RL) holds promise as a means to learn high-reward policies from a static dataset, without the need for further environment interactions. However, a key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset wh...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
12-09-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Offline reinforcement learning (RL) holds promise as a means to learn
high-reward policies from a static dataset, without the need for further
environment interactions. However, a key challenge in offline RL lies in
effectively stitching portions of suboptimal trajectories from the static
dataset while avoiding extrapolation errors arising due to a lack of support in
the dataset. Existing approaches use conservative methods that are tricky to
tune and struggle with multi-modal data (as we show) or rely on noisy Monte
Carlo return-to-go samples for reward conditioning. In this work, we propose a
novel approach that leverages the expressiveness of latent diffusion to model
in-support trajectory sequences as compressed latent skills. This facilitates
learning a Q-function while avoiding extrapolation error via
batch-constraining. The latent space is also expressive and gracefully copes
with multi-modal data. We show that the learned temporally-abstract latent
space encodes richer task-specific information for offline RL tasks as compared
to raw state-actions. This improves credit assignment and facilitates faster
reward propagation during Q-learning. Our method demonstrates state-of-the-art
performance on the D4RL benchmarks, particularly excelling in long-horizon,
sparse-reward tasks. |
---|---|
DOI: | 10.48550/arxiv.2309.06599 |