Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles

[Display omitted] •Novel hybrid Bag-REPTree and RS-REPTree ensemble frameworks for flood susceptibility.•Optimization of input factors using ReliefF method.•ROC, standard error, CI at 95%, and Wilcoxon signed-rank test were used for validation and comparison of the models.•RS-REPTree has the highest...

Full description

Saved in:
Bibliographic Details
Published in:Journal of hydrology (Amsterdam) Vol. 575; pp. 864 - 873
Main Authors: Chen, Wei, Hong, Haoyuan, Li, Shaojun, Shahabi, Himan, Wang, Yi, Wang, Xiaojing, Ahmad, Baharin Bin
Format: Journal Article
Language:English
Published: Elsevier B.V 01-08-2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •Novel hybrid Bag-REPTree and RS-REPTree ensemble frameworks for flood susceptibility.•Optimization of input factors using ReliefF method.•ROC, standard error, CI at 95%, and Wilcoxon signed-rank test were used for validation and comparison of the models.•RS-REPTree has the highest prediction capability and proved the superiority of the ensemble method. Flooding is a very common natural hazard that causes catastrophic effects worldwide. Recently, ensemble-based techniques have become popular in flood susceptibility modelling due to their greater strength and efficiency in the prediction of flood locations. Thus, the aim of this study was to employ machine learning-based Reduced-error pruning trees (REPTree) with Bagging (Bag-REPTree) and Random subspace (RS-REPTree) ensemble frameworks for spatial prediction of flood susceptibility using a geographic information system (GIS). First, a flood spatial database was constructed with 363 flood locations and thirteen flood influencing factors, namely altitude, slope angle, slope aspect, curvature, stream power index (SPI), sediment transport index (STI), topographic wetness index (TWI), distance to rivers, normalized difference vegetation index (NDVI), soil, land use, lithology, and rainfall. Subsequently, correlation attribute evaluation (CAE) was used as the factor selection method for optimization of input factors. Finally, the receiver operating characteristic (ROC) curve, standard error (SE), confidence interval (CI) at 95%, and Wilcoxon signed-rank test were used to validate and compare the performance of the models. Results show that the RS-REPTree model has the highest prediction capability for flood susceptibility assessment, with the highest area under (the ROC) curve (AUC) value (0.949, 0.907), the smallest SE (0.011, 0.023), and the narrowest CI (95%) (0.928–0.970, 0.863–0.952) for the training and validation datasets. It was followed by the Bag-REPTree and REPTree models, respectively. The results also proved the superiority of the ensemble method over using these methods individually.
ISSN:0022-1694
1879-2707
DOI:10.1016/j.jhydrol.2019.05.089