Value Prediction Network
This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are...
Saved in:
Main Authors: | , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
11-07-2017
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper proposes a novel deep reinforcement learning (RL) architecture,
called Value Prediction Network (VPN), which integrates model-free and
model-based RL methods into a single neural network. In contrast to typical
model-based RL methods, VPN learns a dynamics model whose abstract states are
trained to make option-conditional predictions of future values (discounted sum
of rewards) rather than of future observations. Our experimental results show
that VPN has several advantages over both model-free and model-based baselines
in a stochastic environment where careful planning is required but building an
accurate observation-prediction model is difficult. Furthermore, VPN
outperforms Deep Q-Network (DQN) on several Atari games even with
short-lookahead planning, demonstrating its potential as a new way of learning
a good state representation. |
---|---|
DOI: | 10.48550/arxiv.1707.03497 |