Offline Meta-Reinforcement Learning with Contrastive Prediction

Traditional reinforcement learning algorithms require lots of online interaction with the environment for training and cannot effectively adapt to changes in the task environment, making them difficult to apply to real-world problems. Offline meta-reinforcement learning provides an effective way to...

Full description

Saved in:
Bibliographic Details
Published in:Jisuanji kexue yu tansuo Vol. 17; no. 8; pp. 1917 - 1927
Main Author: HAN Xu, WU Feng
Format: Journal Article
Language:Chinese
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 01-08-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Traditional reinforcement learning algorithms require lots of online interaction with the environment for training and cannot effectively adapt to changes in the task environment, making them difficult to apply to real-world problems. Offline meta-reinforcement learning provides an effective way to quickly adapt to a new task by using replay datasets of multiple tasks for offline policy learning. Applying offline meta-reinforcement learning to complex tasks will face two challenges. Firstly, reinforcement learning algorithms overestimate the value of state-action pairs not contained in the dataset and thus select non-optimal actions, resulting in poor performance. Secondly, meta-reinforcement learning algorithms need not only to learn the policy but also to have robust and efficient task inference capabilities. To address the above problems, this paper proposes an offline meta-reinfor-cement learning algorithm based on contrastive prediction. To cope with the problem of overestimation of value functions, the
ISSN:1673-9418
DOI:10.3778/j.issn.1673-9418.2203074