Comparison of four statistical and machine learning methods for crash severity prediction

•Four different classification methods were investigated for crash severity prediction.•The effects of two data clustering methods on crash severity prediction were explored.•A crash costs-related method was proposed to use for comparing the prediction methods.•Comparison of the methods uncovered si...

Full description

Saved in:
Bibliographic Details
Published in:Accident analysis and prevention Vol. 108; pp. 27 - 36
Main Authors: Iranitalab, Amirfarrokh, Khattak, Aemal
Format: Journal Article
Language:English
Published: England Elsevier Ltd 01-11-2017
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Four different classification methods were investigated for crash severity prediction.•The effects of two data clustering methods on crash severity prediction were explored.•A crash costs-related method was proposed to use for comparing the prediction methods.•Comparison of the methods uncovered significant difference in their prediction performance.•Practical suggestions for using the prediction models were provided. Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012–2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012–2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0001-4575
1879-2057
DOI:10.1016/j.aap.2017.08.008