Building an XGBoost model based on landscape metrics and meteorological data for nonpoint source pollution management in the Nakdong river watershed

•Build a machine learning model to predict the achievement of TMDL water quality goals.•Meteorological factors highly influence the model predictions for all land use types.•Identify highly influenced landscape metrics depending on the land use type.•Provide optimal river water quality management st...

Full description

Saved in:
Bibliographic Details
Published in:Ecological indicators Vol. 165; p. 112156
Main Authors: Hee Shim, Sun, Hyun Choi, Jung
Format: Journal Article
Language:English
Published: Elsevier Ltd 01-08-2024
Elsevier
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Build a machine learning model to predict the achievement of TMDL water quality goals.•Meteorological factors highly influence the model predictions for all land use types.•Identify highly influenced landscape metrics depending on the land use type.•Provide optimal river water quality management strategies based on watershed characteristics. To effectively operate river water quality under current fourth-phase Total Maximum Daily Load (TMDL) management system, we built a machine learning model that predicts whether water quality goals are achieved for the entire Nakdong river watershed in Korea. First, to consider the effects of land use type on the runoff characteristics of pollutants, K-means clustering was used to classify the watershed into three areas: agricultural areas, forest areas, and urban areas. Next, we developed a machine learning model to predict the achievement of BOD, TP, and TOC water quality goals in the different rainfall seasons. At this time, the Isolated Forest and ADASYN machine learning techniques were used to preprocess the training data. Finally, SHAP was used to find the factors with the greatest effects on the achievement of water quality goals. This model’s average prediction results for TP, BOD, and TOC showed accuracy ranging from 0.6 to 1.0. Meteorological factors, particularly monthly precipitation and average temperature, were found to highly influence the model predictions for all land use types. In the landscape metrics, ED showed a high level of importance in all land use types. CONTAG was the main factor in agricultural areas; ED, LPI, CONTAG, COHESION and SHDI were the main factors in forest areas; and PD, ED, SHDI, and COHESION were the main factors in urban areas. The monthly precipitation and average temperature significantly affected whether the TMDL water quality goals were achieved in all sub watersheds, and the landscape metrics calculated as highly influenced factors differed depending on the land use type. Therefore, customized watershed management according to land use characteristics is necessary. These results provide valuable ideas for land use managers and landscape planners to achieve water quality goals through the management of non-point source pollution.
ISSN:1470-160X
1872-7034
DOI:10.1016/j.ecolind.2024.112156