Causal network inference in a dam system and its implications on feature selection for machine learning forecasting
A fundamental goal across many research fields is to explain possible mechanisms behind a phenomenon and infer the correct causal relationships between variables. In this work, we employed various causal inference methods to derive the causal network of a dam system from time series data. Here we ex...
Saved in:
Published in: | Physica A Vol. 604; p. 127893 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
Elsevier B.V
15-10-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A fundamental goal across many research fields is to explain possible mechanisms behind a phenomenon and infer the correct causal relationships between variables. In this work, we employed various causal inference methods to derive the causal network of a dam system from time series data. Here we explored the lagged effects of water levels in two dams, climate and weather variables, and domestic and agricultural water demands on each other. Among the methods considered, we demonstrated that convergent cross mapping (CCM), a method for inferring causal relationships in complex systems using time series data, is the most consistent with an actual dam system: (1) causal links were consistent with the direction of the physical flow of water, (2) the effects of climate and weather variables were successfully captured, (3) the time lags shed light on the dynamics of the dam system and possibly reflected planning schedules which are not explicit in the data. Our results captured both intuitive and counter-intuitive causal links, some of which were validated by domain experts. Using the resulting causal links to pre-select the input variables in machine learning-based forecasting models significantly reduces the prediction errors compared to using randomly selected features. Specifically, the best reduction in MAE is 4.2–4.4 meters, which corresponds to an improvement of 2.8–3.0 times lower than using random selection of features. CCM was also able to filter the top 20 significant predictors, where further addition of other variables yielded negligible improvement in the MAE. This is the first work that demonstrates successful inference of time-lagged causal network of endogenous and exogenous variables in a dam system. |
---|---|
ISSN: | 0378-4371 1873-2119 |
DOI: | 10.1016/j.physa.2022.127893 |