Offline Meta-Reinforcement Learning with Contrastive Prediction
Traditional reinforcement learning algorithms require lots of online interaction with the environment for training and cannot effectively adapt to changes in the task environment, making them difficult to apply to real-world problems. Offline meta-reinforcement learning provides an effective way to...
Saved in:
Published in: | Jisuanji kexue yu tansuo Vol. 17; no. 8; pp. 1917 - 1927 |
---|---|
Main Author: | |
Format: | Journal Article |
Language: | Chinese |
Published: |
Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
01-08-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Traditional reinforcement learning algorithms require lots of online interaction with the environment for training and cannot effectively adapt to changes in the task environment, making them difficult to apply to real-world problems. Offline meta-reinforcement learning provides an effective way to quickly adapt to a new task by using replay datasets of multiple tasks for offline policy learning. Applying offline meta-reinforcement learning to complex tasks will face two challenges. Firstly, reinforcement learning algorithms overestimate the value of state-action pairs not contained in the dataset and thus select non-optimal actions, resulting in poor performance. Secondly, meta-reinforcement learning algorithms need not only to learn the policy but also to have robust and efficient task inference capabilities. To address the above problems, this paper proposes an offline meta-reinfor-cement learning algorithm based on contrastive prediction. To cope with the problem of overestimation of value functions, the |
---|---|
ISSN: | 1673-9418 |
DOI: | 10.3778/j.issn.1673-9418.2203074 |