Reinforcement Learning Based Optimal Tracking Control Under Unmeasurable Disturbances With Application to HVAC Systems

This paper presents the design of an optimal controller for solving tracking problems subject to unmeasurable disturbances and unknown system dynamics using reinforcement learning (RL). Many existing RL control methods take disturbance into account by directly measuring it and manipulating it for ex...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transaction on neural networks and learning systems Vol. 33; no. 12; pp. 7523 - 7533
Main Authors: Rizvi, Syed Ali Asad, Pertzborn, Amanda J., Lin, Zongli
Format: Journal Article
Language:English
Published: Piscataway IEEE 01-12-2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents the design of an optimal controller for solving tracking problems subject to unmeasurable disturbances and unknown system dynamics using reinforcement learning (RL). Many existing RL control methods take disturbance into account by directly measuring it and manipulating it for exploration during the learning process, thereby preventing any disturbance induced bias in the control estimates. However, in most practical scenarios, disturbance is neither measurable nor manipulable. The main contribution of this article is the introduction of a combination of a bias compensation mechanism and the integral action in the Q-learning framework to remove the need to measure or manipulate the disturbance, while preventing disturbance induced bias in the optimal control estimates. A bias compensated Q-learning scheme is presented that learns the disturbance induced bias terms separately from the optimal control parameters and ensures the convergence of the control parameters to the optimal solution even in the presence of unmeasurable disturbances. Both state feedback and output feedback algorithms are developed based on policy iteration (PI) and value iteration (VI) that guarantee the convergence of the tracking error to zero. The feasibility of the design is validated on a practical optimal control application of a heating, ventilating, and air conditioning (HVAC) zone controller.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2021.3085358