Robust Reinforcement Learning Under Dimension-Wise State Information Drop

Recent advancements in offline reinforcement learning (RL) have showcased the potential for leveraging static datasets to train optimal policies. However, real-world applications often face challenges due to missing or incomplete state information caused by imperfect sensor performance or intentiona...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access Vol. 12; pp. 135283 - 135299
Main Authors:	Kim, Gyeongmin, Kim, Jeonghye, Lee, Suyoung, Baek, Jaewoo, Moon, Howon, Shin, Sangheon, Sung, Youngchul
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Data loss Delays Dimension-wise state information drop drop information embedding masked observation Program processors Reinforcement learning robust learning Robust stability Robustness Terminology Training Transformers
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recent advancements in offline reinforcement learning (RL) have showcased the potential for leveraging static datasets to train optimal policies. However, real-world applications often face challenges due to missing or incomplete state information caused by imperfect sensor performance or intentional interlaces. We propose the Dimension-Wise Drop Decision Transformer (D3T), a novel framework designed to address dimension-wise data loss in sensor observations, enhancing the robustness of RL algorithms in real-world scenarios. D3T innovatively incorporates dimension-wise drop information embeddings within the Transformer architecture, facilitating effective decision-making even with incomplete observations. Our evaluation in the D4RL MuJoCo domain demonstrates that D3T significantly outperforms existing methods such as the Decision Transformer, particularly with substantial dimension-wise drops of observations. These results confirm D3T's capability in managing real-world imperfections in state observations and illustrate its potential to substantially expand the applicability of RL in more complex and dynamic environments.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3462803