PyLEnM: A Machine Learning Framework for Long-Term Groundwater Contamination Monitoring Strategies

In this study, we have developed a comprehensive machine learning (ML) framework for long-term groundwater contamination monitoring as the Python package PyLEnM (Python for Long-term Environmental Monitoring). PyLEnM aims to establish the seamless data-to-ML pipeline with various utility functions,...

Full description

Saved in:
Bibliographic Details
Published in:Environmental science & technology Vol. 56; no. 9; pp. 5973 - 5983
Main Authors: Meray, Aurelien O, Sturla, Savannah, Siddiquee, Masudur R, Serata, Rebecca, Uhlemann, Sebastian, Gonzalez-Raymat, Hansell, Denham, Miles, Upadhyay, Himanshu, Lagos, Leonel E, Eddy-Dilek, Carol, Wainwright, Haruko M
Format: Journal Article
Language:English
Published: United States American Chemical Society 03-05-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this study, we have developed a comprehensive machine learning (ML) framework for long-term groundwater contamination monitoring as the Python package PyLEnM (Python for Long-term Environmental Monitoring). PyLEnM aims to establish the seamless data-to-ML pipeline with various utility functions, such as quality assurance and quality control (QA/QC), coincident/colocated data identification, the automated ingestion and processing of publicly available spatial data layers, and novel data summarization/visualization. The key ML innovations include (1) time series/multianalyte clustering to find the well groups that have similar groundwater dynamics and to inform spatial interpolation and well optimization, (2) the automated model selection and parameter tuning, comparing multiple regression models for spatial interpolation, (3) the proxy-based spatial interpolation method by including spatial data layers or in situ measurable variables as predictors for contaminant concentrations and groundwater levels, and (4) the new well optimization algorithm to identify the most effective subset of wells for maintaining the spatial interpolation ability for long-term monitoring. We demonstrate our methodology using the monitoring data at the Savannah River Site F-Area. Through this open-source PyLEnM package, we aim to improve the transparency of data analytics at contaminated sites, empowering concerned citizens as well as improving public relations.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
USDOE Office of Environmental Management (EM)
AC02-05CH11231; EM0005213; 89303321CEM000080
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
USDOE Office of Science (SC), Basic Energy Sciences (BES)
ISSN:0013-936X
1520-5851
DOI:10.1021/acs.est.1c07440