Effectiveness of Warm-Start PPO for Guidance with Highly Constrained Nonlinear Fixed-Wing Dynamics

Reinforcement learning (RL) may enable fixedwing unmanned aerial vehicle (UAV) guidance to achieve more agile and complex objectives than typical methods. However, RL has yet struggled to achieve even minimal success on this problem; fixed-wing flight with RL-based guidance has only been demonstrate...

Full description

Saved in:
Bibliographic Details
Published in:2023 American Control Conference (ACC) pp. 3288 - 3295
Main Authors: Coletti, Christian T., Williams, Kyle A., Lehman, Hannah C., Kakish, Zahi M., Whitten, Daniel, Parish, Julie J.
Format: Conference Proceeding
Language:English
Published: American Automatic Control Council 31-05-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Reinforcement learning (RL) may enable fixedwing unmanned aerial vehicle (UAV) guidance to achieve more agile and complex objectives than typical methods. However, RL has yet struggled to achieve even minimal success on this problem; fixed-wing flight with RL-based guidance has only been demonstrated in literature with reduced state and/or action spaces. In order to achieve full 6-DOF RL-based guidance, this study begins training with imitation learning from classical guidance, a method known as warm-staring (WS), before further training using Proximal Policy Optimization (PPO). We show that warm starting is critical to successful RL performance on this problem. PPO alone achieved a 2% success rate in our experiments. Warm-starting alone achieved 32% success. Warm-starting plus PPO achieved 57% success over all policies, with 40% of policies achieving 94% success.
ISSN:2378-5861
DOI:10.23919/ACC55779.2023.10156267