Machine Learning Ensembles and Rail Defects Prediction: Multilayer Stacking Methodology

AbstractMachine learning has taken a front seat in railway big data analysis. This is partly due to perpetual data collection and the need for automated systems to expedite maintenance decisions. A case for track defect prognosis in rail track engineering is presented in this paper. Fatigue defects...

Full description

Saved in:
Bibliographic Details
Published in:ASCE-ASME journal of risk and uncertainty in engineering systems. Part A, Civil Engineering Vol. 5; no. 4
Main Authors: Lasisi, Ahmed, Attoh-Okine, Nii
Format: Journal Article
Language:English
Published: Reston American Society of Civil Engineers 01-12-2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:AbstractMachine learning has taken a front seat in railway big data analysis. This is partly due to perpetual data collection and the need for automated systems to expedite maintenance decisions. A case for track defect prognosis in rail track engineering is presented in this paper. Fatigue defects are very common and are influential on rail maintenance. Understanding such defects is essential for optimized maintenance scheduling. The literature is replete with machine learning models developed for defect prediction. Because no single machine learning model is guaranteed to surpass others with every kind of data, each model has its inherent deficiencies. Classifier ensembles such as bagging or boosting aggregate strengths from different models to enhance prediction. The outcome is very effective, although highly correlated. This work proposes a stacking method of combining average learners into powerful learning machines while considering memory, time, computational, structural complexities, and bias-variance trade-offs. Because of the large scale of rail infrastructure considered in this work (35,406 km), this study shows that classical Weibull analysis underestimates annual fatigue defects by at least 25% throughout rail life. The proposed stacking ensemble compensates for this shortfall by aggregating the probability predictions of diverse learners. These predictions were combined from a binary classification ensemble of 0.783 receiver operating characteristic area under curve (ROC-AUC) score with significant room for improvement in computation time and curve fitting.
Bibliography:04019016
ISSN:2376-7642
2376-7642
DOI:10.1061/AJRUA6.0001024