Evaluating the predictive power of different machine learning algorithms for groundwater salinity prediction of multi-layer coastal aquifers in the Mekong Delta, Vietnam

[Display omitted] •Top ten influencing factors of groundwater salinization were identified.•The CatBoost Regression model provides the highest accuracy salinity prediction.•In the Mekong Delta, groundwater pumping has a strong impact on salinization processes.•Forty-eight percentage of the populatio...

Full description

Saved in:
Bibliographic Details
Published in:Ecological indicators Vol. 127; p. 107790
Main Authors: Tran, Dang An, Tsujimura, Maki, Ha, Nam Thang, Nguyen, Van Tam, Binh, Doan Van, Dang, Thanh Duc, Doan, Quang-Van, Bui, Dieu Tien, Anh Ngoc, Trieu, Phu, Le Vo, Thuc, Pham Thi Bich, Pham, Tien Dat
Format: Journal Article
Language:English
Published: Elsevier Ltd 01-08-2021
Elsevier
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •Top ten influencing factors of groundwater salinization were identified.•The CatBoost Regression model provides the highest accuracy salinity prediction.•In the Mekong Delta, groundwater pumping has a strong impact on salinization processes.•Forty-eight percentage of the population is in threshold salinity areas (>250 mg/L).•Immediate actions are needed to prevent groundwater salinization. Groundwater salinization is considered as a major environmental problem in worldwide coastal areas, influencing ecosystems and human health. However, an accurate prediction of salinity concentration in groundwater remains a challenge due to the complexity of groundwater salinization processes and its influencing factors. In this study, we evaluate state-of-the-art machine learning (ML) algorithms for predicting groundwater salinity and identify its influencing factors. We conducted a study for the coastal multi-layer aquifers of the Mekong River Delta (Vietnam), using a geodatabase of 216 groundwater samples and 14 conditioning factors. We compared the predictive performances of different ML techniques, i.e., the Random Forest Regression (RFR), the Extreme Gradient Boosting Regression (XGBR), the CatBoost Regression (CBR), and the Light Gradient Boosting Regression (LGBR) models. The model performance was assessed by using root-mean-square error (RMSE), coefficient of determination (R2), the Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). The results show that the CBR model has the highest performance with both training (R2 = 0.999, RMSE = 29.90) and testing datasets (R2 = 0.84, RMSE = 205.96, AIC = 720.60, and BIC = 751.04). Ten of the 14 influencing factors, including the distance to saline sources, the depth of screen well, the groundwater level, the vertical hydraulic conductivity, the operation time, the well density, the extraction capacity, the thickness of the aquitard, the distance to fault, and the horizontal hydraulic conductivity are the most important factors for groundwater salinity prediction. The results provide insights for policymakers in proposing remediation and management strategies for groundwater salinity issues in the context of excessive groundwater exploitation in coastal lowland regions. Since the human-induced influencing factors have significantly influenced groundwater salinization, urgent actions should be taken into consideration to ensure sustainable groundwater management in the coastal areas of the Mekong River Delta.
ISSN:1470-160X
1872-7034
DOI:10.1016/j.ecolind.2021.107790