Reinforcement Learning Based Optimal Tracking Control Under Unmeasurable Disturbances With Application to HVAC Systems
This paper presents the design of an optimal controller for solving tracking problems subject to unmeasurable disturbances and unknown system dynamics using reinforcement learning (RL). Many existing RL control methods take disturbance into account by directly measuring it and manipulating it for ex...
Saved in:
Published in: | IEEE transaction on neural networks and learning systems Vol. 33; no. 12; pp. 7523 - 7533 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
Piscataway
IEEE
01-12-2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper presents the design of an optimal controller for solving tracking problems subject to unmeasurable disturbances and unknown system dynamics using reinforcement learning (RL). Many existing RL control methods take disturbance into account by directly measuring it and manipulating it for exploration during the learning process, thereby preventing any disturbance induced bias in the control estimates. However, in most practical scenarios, disturbance is neither measurable nor manipulable. The main contribution of this article is the introduction of a combination of a bias compensation mechanism and the integral action in the Q-learning framework to remove the need to measure or manipulate the disturbance, while preventing disturbance induced bias in the optimal control estimates. A bias compensated Q-learning scheme is presented that learns the disturbance induced bias terms separately from the optimal control parameters and ensures the convergence of the control parameters to the optimal solution even in the presence of unmeasurable disturbances. Both state feedback and output feedback algorithms are developed based on policy iteration (PI) and value iteration (VI) that guarantee the convergence of the tracking error to zero. The feasibility of the design is validated on a practical optimal control application of a heating, ventilating, and air conditioning (HVAC) zone controller. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2162-237X 2162-2388 |
DOI: | 10.1109/TNNLS.2021.3085358 |