Context-Aware Meta-RL With Two-Stage Constrained Adaptation for Urban Driving
End-to-end driving based on reinforcement learning (RL) has emerged as a promising approach in autonomous driving, with the potential to surpass human drivers. However, the exploration and exploitation dilemma of RL leads to low sample efficiency and high computational costs to train an optimal driv...
Saved in:
Published in: | IEEE transactions on vehicular technology Vol. 73; no. 2; pp. 1567 - 1581 |
---|---|
Main Authors: | , , , , |
Format: | Journal Article |
Language: | English |
Published: |
New York
IEEE
01-02-2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | End-to-end driving based on reinforcement learning (RL) has emerged as a promising approach in autonomous driving, with the potential to surpass human drivers. However, the exploration and exploitation dilemma of RL leads to low sample efficiency and high computational costs to train an optimal driving model, and the environment feedback-oriented learning mode limits the model's ability to handle unseen traffic scenarios, especially when dealing with the highly varied tasks of urban driving. In this article, we proposed a context-aware meta-RL framework with two-stage constrained adaptation for challenging urban driving. Firstly, the context-aware state representation enhanced by historical behavior is construct to improve the learning efficiency and robustness, where the most efficient context encoder is selected among four different forms. Next, the end-to-end meta-driving model with high generalization ability is built through the parallel rollouts of multiple tasks and simplified meta-training procedure. Then, a two-stage constrained adaptation strategy is designed to quickly transfer the meta model to new tasks while maintaining good performance, where the meta-training data are reused through the context-based propensity estimation to constrain the optimization objective of new tasks. Extensive experiments are performed in CARLA simulator with various urban scenarios, and the results validate the superiority of our proposed models in both learning efficiencyand generalization compared to state-of-the-art algorithms. |
---|---|
ISSN: | 0018-9545 1939-9359 |
DOI: | 10.1109/TVT.2023.3312495 |