Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem

Approximate dynamic programming (ADP) and reinforcement learning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applicat...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems Vol. 30; no. 5; pp. 1523 - 1536
Main Authors:	Rizvi, Syed Ali Asad, Lin, Zongli
Format:	Journal Article
Language:	English
Published:	United States IEEE 01-05-2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Adaptive control Algorithms Approximate dynamic programming (ADP) Computer simulation Control systems Control systems design Cost function Discrete time systems Dynamic programming Feedback Heuristic algorithms Information systems Learning algorithms linear quadratic regulation (LQR) Linear quadratic regulator Machine learning Mathematical model Output feedback Q-learning reinforcement learning (RL) Representations Riccati equation Stability analysis State feedback System dynamics
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Approximate dynamic programming (ADP) and reinforcement learning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applications. As a result, output feedback methods are more desirable as they relax this requirement. In this paper, we present a new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems. The proposed scheme is completely online in nature and works without requiring the system dynamics information. More specifically, a new representation of the LQR Q-function is developed in terms of the input-output data. Based on this new Q-function representation, output feedback LQR controllers are designed. We present two output feedback iterative Q-learning algorithms based on the policy iteration and the value iteration methods. This scheme has the advantage that it does not incur any excitation noise bias, and therefore, the need of using discounted cost functions is circumvented, which in turn ensures closed-loop stability. It is shown that the proposed algorithms converge to the solution of the LQR Riccati equation. A comprehensive simulation study is carried out, which illustrates the proposed scheme.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2018.2870075