Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback

In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, w...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on cybernetics Vol. 50; no. 11; pp. 4670 - 4679
Main Authors: Rizvi, Syed Ali Asad, Lin, Zongli
Format: Journal Article
Language:English
Published: United States IEEE 01-11-2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2168-2267
2168-2275
DOI:10.1109/TCYB.2018.2886735