Context-Aware Meta-RL With Two-Stage Constrained Adaptation for Urban Driving

End-to-end driving based on reinforcement learning (RL) has emerged as a promising approach in autonomous driving, with the potential to surpass human drivers. However, the exploration and exploitation dilemma of RL leads to low sample efficiency and high computational costs to train an optimal driv...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on vehicular technology Vol. 73; no. 2; pp. 1567 - 1581
Main Authors: Deng, Qi, Li, Ruyang, Hu, Qifu, Zhao, Yaqian, Li, Rengang
Format: Journal Article
Language:English
Published: New York IEEE 01-02-2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:End-to-end driving based on reinforcement learning (RL) has emerged as a promising approach in autonomous driving, with the potential to surpass human drivers. However, the exploration and exploitation dilemma of RL leads to low sample efficiency and high computational costs to train an optimal driving model, and the environment feedback-oriented learning mode limits the model's ability to handle unseen traffic scenarios, especially when dealing with the highly varied tasks of urban driving. In this article, we proposed a context-aware meta-RL framework with two-stage constrained adaptation for challenging urban driving. Firstly, the context-aware state representation enhanced by historical behavior is construct to improve the learning efficiency and robustness, where the most efficient context encoder is selected among four different forms. Next, the end-to-end meta-driving model with high generalization ability is built through the parallel rollouts of multiple tasks and simplified meta-training procedure. Then, a two-stage constrained adaptation strategy is designed to quickly transfer the meta model to new tasks while maintaining good performance, where the meta-training data are reused through the context-based propensity estimation to constrain the optimization objective of new tasks. Extensive experiments are performed in CARLA simulator with various urban scenarios, and the results validate the superiority of our proposed models in both learning efficiencyand generalization compared to state-of-the-art algorithms.
ISSN:0018-9545
1939-9359
DOI:10.1109/TVT.2023.3312495