Application of structural topic modeling to aviation safety data
Data-driven frameworks for analyzing aviation safety data have recently gained traction. Text-based machine learning techniques often rely purely on word frequency analysis to eliminate the innate subjectivity of human language, but more refined techniques like structural topic modeling (STM) attemp...
Saved in:
Published in: | Reliability engineering & system safety Vol. 224; p. 108522 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
Barking
Elsevier Ltd
01-08-2022
Elsevier BV |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Data-driven frameworks for analyzing aviation safety data have recently gained traction. Text-based machine learning techniques often rely purely on word frequency analysis to eliminate the innate subjectivity of human language, but more refined techniques like structural topic modeling (STM) attempt to simulate text generation to identify the thematic undertones of text corpora. This paper presents an application of STM to two text-based sets of aviation safety data, the Aviation Safety Reporting System (ASRS) and accident and incident reports published by the National Transportation Safety Board (NTSB). A framework for cleaning and pre-processing the datasets is discussed, including a brief discussion of bag-of-words and TF–IDF representations of narratives. The methodology behind STM is described, including techniques for selecting the optimal number of topics. The results of the STM analysis on the ASRS and NTSB datasets are presented, with a focus on the clarity and specificity based on most common words associated with topics. A brief exploration of the correlation between pairs of topic labels is also undertaken, including a visualization of narratives in 2-dimensional space. STM is found to show promise in identifying themes within technical datasets, with model performance increasing for more specific corpora that use precise and unique language.
•Topic-based approach allows in-depth analysis of aviation text narratives.•Topic models better incorporate subjective elements of language than word frequency.•Non-exclusive topic assignment allows better exploration of key themes, correlations. |
---|---|
ISSN: | 0951-8320 1879-0836 |
DOI: | 10.1016/j.ress.2022.108522 |