Undiscounted reinforcement learning for infinite-time optimal output tracking and disturbance rejection of discrete-time LTI systems with unknown dynamics

This paper proposes a novel control structure to solve the infinite-time linear quadratic tracking (LQT) problem. The major challenge in the LQT problem is the boundedness issue of the cost function in an infinite time framework. In many studies, a discount factor is utilised to overcome the challen...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of systems science Vol. 54; no. 10; pp. 2175 - 2195
Main Authors:	Amirparast, Ali, Hosseini Sani, S. Kamal
Format:	Journal Article
Language:	English
Published:	London Taylor & Francis 27-07-2023 Taylor & Francis Ltd
Subjects:	Algorithms Closed loops Cost function Discounts Discrete time systems Feedback control Linear quadratic tracking Machine learning Optimal control policy iteration reinforcement learning Steady state System dynamics Tracking control
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper proposes a novel control structure to solve the infinite-time linear quadratic tracking (LQT) problem. The major challenge in the LQT problem is the boundedness issue of the cost function in an infinite time framework. In many studies, a discount factor is utilised to overcome the challenge. However, it can affect the stability of the closed-loop system and the steady-state error. This paper proposes an optimal control structure that guarantees zero steady-state error with bounded cost function without utilising the discount factor. The optimal gains of the proposed control structure are computed via model-based and model-free reinforcement learning (RL) algorithms. As a novelty in model-based RL algorithms, a model predictive RL algorithm is proposed to reduce the number of iterations in the learning phase. A model-free reinforcement learning algorithm is utilised to obtain optimal control for tracking the reference online and without any knowledge of system dynamics. Finally, the simulation results verify the advantages of the proposed optimal control structure.
ISSN:	0020-7721 1464-5319
DOI:	10.1080/00207721.2023.2221240