Battery control with lookahead constraints in distribution grids using reinforcement learning

In this paper, a computationally efficient real-time control of a battery with lookahead state-of-energy constraints in active distribution grids with distributed energy sources is presented. The goal is to follow a previously computed dispatch plan or to optimize a monetary cost from buying and sel...

Full description

Saved in:
Bibliographic Details
Published in:Electric power systems research Vol. 211; p. 108551
Main Authors: da Silva André, Joel, Stai, Eleni, Stanojev, Ognjen, Hug, Gabriela
Format: Journal Article
Language:English
Published: Elsevier B.V 01-10-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, a computationally efficient real-time control of a battery with lookahead state-of-energy constraints in active distribution grids with distributed energy sources is presented. The goal is to follow a previously computed dispatch plan or to optimize a monetary cost from buying and selling power at the point of common coupling. However, the lookahead constraints render the battery decisions non-trivial. The current practice in literature to solve this problem is Model Predictive Control (MPC), which does not scale for large grids. Instead, here, we propose a reinforcement learning approach based on the Deep Deterministic Policy Gradient (DDPG) algorithm. To satisfy the lookahead battery constraints we adapt the experience replay technique used in DDPG. To guarantee the satisfaction of the hard grid constraints, we introduce a safety layer that performs constrained optimization. Our approach does not need forecasts contrary to MPC. We perform evaluations on a realistic grid and comparisons with Lyapunov optimization and MPC. We show that we can achieve costs close to MPC and Lyapunov, while reducing the computational time by multiple orders of magnitude. •Fast real-time battery control scheme in active distribution grids.•Works without forecasts and without using slow multiperiod optimization.•Satisfies battery state-of-energy lookahead constraints and grid constraints.•Uses Reinforcement Learning with two experience replay buffers and a safety layer.•Outperforms Model Predictive Control in speed and is as good in cost performance.
ISSN:0378-7796
DOI:10.1016/j.epsr.2022.108551