Online reinforcement learning for a continuous space system with experimental validation

Reinforcement learning (RL) for continuous state/action space systems has remained a challenge for nonlinear multivariate dynamical systems even at a simulation level. Implementing such schemes for real-time control is still of a difficulty and remains largely unanswered. In this study, several crit...

Full description

Saved in:
Bibliographic Details
Published in:Journal of process control Vol. 104; pp. 86 - 100
Main Authors: Dogru, Oguzhan, Wieczorek, Nathan, Velswamy, Kirubakaran, Ibrahim, Fadi, Huang, Biao
Format: Journal Article
Language:English
Published: Elsevier Ltd 01-08-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Reinforcement learning (RL) for continuous state/action space systems has remained a challenge for nonlinear multivariate dynamical systems even at a simulation level. Implementing such schemes for real-time control is still of a difficulty and remains largely unanswered. In this study, several critical strategies for practical implementation of RL are developed, and a multivariable, multi-modal, hybrid three-tank (HTT) physical process is utilized to illustrate the proposed strategies. A successful real-time implementation of RL is reported. The first step is a meta-heuristic first principles model parameter optimization, where a custom pseudo random binary signal (PRBS) is used to obtain open-loop experimental data. This is followed by in silico asynchronous advantage actor–critic (A3C/A-A2C) based policy learning. In the second step, three different approaches (namely proximal learning, single trajectory learning, and multiple trajectory learning) are utilized to explore the state/action space. In the final step, online learning (A2C) using the best in silico policy on the real process using a socket connection is established. The extent of exploration (EoE, a measure of exploration) is proposed as a parameter for quantifying exploration of the state/action space. While the online sample efficiency of RL application is enhanced, a soft constraint based constrained learning is proposed and validated. With considerations of the proposed strategies, this work demonstrates the possibility of applying RL to solve practical control problems. •Reinforcement Learning for a nonlinear multivariate continuous state/action space.•First principles-based metaheuristic parameter tuning using plant data.•Simulation based asynchronous advantage actor–critic (A3C) policy learning.•Extent of exploration’s role in improving on entropy-based policy generalization.•Online real time implementation of soft constrained policy updates discussed.
ISSN:0959-1524
1873-2771
DOI:10.1016/j.jprocont.2021.06.004