Undiscounted reinforcement learning for infinite-time optimal output tracking and disturbance rejection of discrete-time LTI systems with unknown dynamics
This paper proposes a novel control structure to solve the infinite-time linear quadratic tracking (LQT) problem. The major challenge in the LQT problem is the boundedness issue of the cost function in an infinite time framework. In many studies, a discount factor is utilised to overcome the challen...
Saved in:
Published in: | International journal of systems science Vol. 54; no. 10; pp. 2175 - 2195 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
London
Taylor & Francis
27-07-2023
Taylor & Francis Ltd |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper proposes a novel control structure to solve the infinite-time linear quadratic tracking (LQT) problem. The major challenge in the LQT problem is the boundedness issue of the cost function in an infinite time framework. In many studies, a discount factor is utilised to overcome the challenge. However, it can affect the stability of the closed-loop system and the steady-state error. This paper proposes an optimal control structure that guarantees zero steady-state error with bounded cost function without utilising the discount factor. The optimal gains of the proposed control structure are computed via model-based and model-free reinforcement learning (RL) algorithms. As a novelty in model-based RL algorithms, a model predictive RL algorithm is proposed to reduce the number of iterations in the learning phase. A model-free reinforcement learning algorithm is utilised to obtain optimal control for tracking the reference online and without any knowledge of system dynamics. Finally, the simulation results verify the advantages of the proposed optimal control structure. |
---|---|
ISSN: | 0020-7721 1464-5319 |
DOI: | 10.1080/00207721.2023.2221240 |