Application of structural topic modeling to aviation safety data

Data-driven frameworks for analyzing aviation safety data have recently gained traction. Text-based machine learning techniques often rely purely on word frequency analysis to eliminate the innate subjectivity of human language, but more refined techniques like structural topic modeling (STM) attemp...

Full description

Saved in:
Bibliographic Details
Published in:Reliability engineering & system safety Vol. 224; p. 108522
Main Authors: Rose, Rodrigo L., Puranik, Tejas G., Mavris, Dimitri N., Rao, Arjun H.
Format: Journal Article
Language:English
Published: Barking Elsevier Ltd 01-08-2022
Elsevier BV
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data-driven frameworks for analyzing aviation safety data have recently gained traction. Text-based machine learning techniques often rely purely on word frequency analysis to eliminate the innate subjectivity of human language, but more refined techniques like structural topic modeling (STM) attempt to simulate text generation to identify the thematic undertones of text corpora. This paper presents an application of STM to two text-based sets of aviation safety data, the Aviation Safety Reporting System (ASRS) and accident and incident reports published by the National Transportation Safety Board (NTSB). A framework for cleaning and pre-processing the datasets is discussed, including a brief discussion of bag-of-words and TF–IDF representations of narratives. The methodology behind STM is described, including techniques for selecting the optimal number of topics. The results of the STM analysis on the ASRS and NTSB datasets are presented, with a focus on the clarity and specificity based on most common words associated with topics. A brief exploration of the correlation between pairs of topic labels is also undertaken, including a visualization of narratives in 2-dimensional space. STM is found to show promise in identifying themes within technical datasets, with model performance increasing for more specific corpora that use precise and unique language. •Topic-based approach allows in-depth analysis of aviation text narratives.•Topic models better incorporate subjective elements of language than word frequency.•Non-exclusive topic assignment allows better exploration of key themes, correlations.
ISSN:0951-8320
1879-0836
DOI:10.1016/j.ress.2022.108522