Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale

Groundwater management decisions require robust methods that allow accurate predictive modeling of pollutant occurrences. In this study, random forest regression (RFR) was used for modeling groundwater nitrate contamination at the African continent scale. When compared to more conventional technique...

Full description

Saved in:
Bibliographic Details
Published in:Hydrogeology journal Vol. 27; no. 3; pp. 1081 - 1098
Main Authors: Ouedraogo, Issoufou, Defourny, Pierre, Vanclooster, Marnik
Format: Journal Article
Language:English
Published: Berlin/Heidelberg Springer Berlin Heidelberg 01-05-2019
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Groundwater management decisions require robust methods that allow accurate predictive modeling of pollutant occurrences. In this study, random forest regression (RFR) was used for modeling groundwater nitrate contamination at the African continent scale. When compared to more conventional techniques, key advantages of RFR include its nonparametric nature, its high predictive accuracy, and its capability to determine variable importance. The latter can be used to better understand the individual role and the combined effect of explanatory variables in a predictive model. In the absence of a systematic groundwater monitoring program at the African continent scale, the study used the groundwater nitrate contamination database for the continent obtained from a meta-analysis to test the modeling approach; 250 groundwater nitrate pollution studies from the African continent were compiled using the literature data. A geographic information system database of 13 spatial attributes was collected, related to land use, soil type, hydrogeology, topography, climatology, type of region, and nitrogen fertilizer application rate, and these were assigned as predictors. The RFR performance was evaluated in comparison to the multiple linear regression (MLR) methods. By using RFR, it was possible to establish which explanatory variables influence the occurrence of nitrate pollution in groundwater (population density, rainfall, recharge, etc.). Both the RFR and MLR techniques identified population density as the most important variable explaining reported nitrate contamination. However, RFR has a much higher predictive power ( R 2  = 0.97) than a traditional linear regression model ( R 2  = 0.64). RFR is therefore considered a very promising technique for large-scale modeling of groundwater nitrate pollution.
ISSN:1431-2174
1435-0157
DOI:10.1007/s10040-018-1900-5