Machine Learning Ensembles and Rail Defects Prediction: Multilayer Stacking Methodology
AbstractMachine learning has taken a front seat in railway big data analysis. This is partly due to perpetual data collection and the need for automated systems to expedite maintenance decisions. A case for track defect prognosis in rail track engineering is presented in this paper. Fatigue defects...
Saved in:
Published in: | ASCE-ASME journal of risk and uncertainty in engineering systems. Part A, Civil Engineering Vol. 5; no. 4 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
Reston
American Society of Civil Engineers
01-12-2019
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | AbstractMachine learning has taken a front seat in railway big data analysis. This is partly due to perpetual data collection and the need for automated systems to expedite maintenance decisions. A case for track defect prognosis in rail track engineering is presented in this paper. Fatigue defects are very common and are influential on rail maintenance. Understanding such defects is essential for optimized maintenance scheduling. The literature is replete with machine learning models developed for defect prediction. Because no single machine learning model is guaranteed to surpass others with every kind of data, each model has its inherent deficiencies. Classifier ensembles such as bagging or boosting aggregate strengths from different models to enhance prediction. The outcome is very effective, although highly correlated. This work proposes a stacking method of combining average learners into powerful learning machines while considering memory, time, computational, structural complexities, and bias-variance trade-offs. Because of the large scale of rail infrastructure considered in this work (35,406 km), this study shows that classical Weibull analysis underestimates annual fatigue defects by at least 25% throughout rail life. The proposed stacking ensemble compensates for this shortfall by aggregating the probability predictions of diverse learners. These predictions were combined from a binary classification ensemble of 0.783 receiver operating characteristic area under curve (ROC-AUC) score with significant room for improvement in computation time and curve fitting. |
---|---|
Bibliography: | 04019016 |
ISSN: | 2376-7642 2376-7642 |
DOI: | 10.1061/AJRUA6.0001024 |