Unmanned Aerial Vehicle Path Planning Algorithm Based on Deep Reinforcement Learning in Large-Scale and Dynamic Environments

Path planning is one of the key technologies for autonomous flight of Unmanned Aerial Vehicle. Traditional path planning algorithms have some limitations and deficiencies in the complex and dynamic environment. In this article, we propose a deep reinforcement learning approach for three-dimensional...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access Vol. 9; pp. 24884 - 24900
Main Authors:	Xie, Ronglei, Meng, Zhijun, Wang, Lifeng, Li, Haochen, Wang, Kaipeng, Wu, Zhe
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Deep learning Deep reinforcement learning Dimensional stability Heuristic algorithms Machine learning Markov processes Obstacle avoidance Path planning recurrent neural network Recurrent neural networks Reinforcement learning Safety Unknown environments Unmanned aerial vehicles Vehicle dynamics
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Path planning is one of the key technologies for autonomous flight of Unmanned Aerial Vehicle. Traditional path planning algorithms have some limitations and deficiencies in the complex and dynamic environment. In this article, we propose a deep reinforcement learning approach for three-dimensional path planning by utilizing the local information and relative distance without global information. UAV can obtain the limited environmental information nearby in the actual scenario with limited sensor capabilities. Therefore, path planning can be formulated as a Partially Observable Markov Decision Process. The recurrent neural network with temporal memory is constructed to address the partial observability problem by extracting crucial information from historical state-action sequences. We develop an action selection strategy that combines the current reward value and the state-action value to reduce the meaningless exploration. In addition, we construct two sample memory pools and propose an adaptive experience replay mechanism based on the frequency of failure. The simulation experiment results show that our method has significant improvements over Deep Q-Network and Deep Recurrent Q-Network in terms of stability and learning efficiency. Our approach successfully plans a reasonable three-dimensional path in the large-scale and complex environment, and has the perfect ability to avoid obstacles.in the unknown environment.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3057485