Construction accident narrative classification: An evaluation of text mining techniques

•Evaluated six machine learning algorithms for classifying accident narratives in accordance to 11 accident types.•Found that support vector machine (SVM) produced the best performance.•Across the 11 accident types, the average precision of the SVM was 0.73, average recall was 0.63, and average F1 s...

Full description

Saved in:
Bibliographic Details
Published in:Accident analysis and prevention Vol. 108; pp. 122 - 130
Main Authors: Goh, Yang Miang, Ubeynarayana, C.U.
Format: Journal Article
Language:English
Published: England Elsevier Ltd 01-11-2017
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract •Evaluated six machine learning algorithms for classifying accident narratives in accordance to 11 accident types.•Found that support vector machine (SVM) produced the best performance.•Across the 11 accident types, the average precision of the SVM was 0.73, average recall was 0.63, and average F1 score was 0.67.•Commonly mislabeled cases were evaluated using confusion matrix, and qualitative evaluation of mislabeled cases.•A set of 1,000 labelled accident narratives and more than 3,000 unlabeled narratives were made available publicly. Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided.
AbstractList Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided.
•Evaluated six machine learning algorithms for classifying accident narratives in accordance to 11 accident types.•Found that support vector machine (SVM) produced the best performance.•Across the 11 accident types, the average precision of the SVM was 0.73, average recall was 0.63, and average F1 score was 0.67.•Commonly mislabeled cases were evaluated using confusion matrix, and qualitative evaluation of mislabeled cases.•A set of 1,000 labelled accident narratives and more than 3,000 unlabeled narratives were made available publicly. Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided.
Author Ubeynarayana, C.U.
Goh, Yang Miang
Author_xml – sequence: 1
  givenname: Yang Miang
  surname: Goh
  fullname: Goh, Yang Miang
  email: bdggym@nus.edu.sg
– sequence: 2
  givenname: C.U.
  surname: Ubeynarayana
  fullname: Ubeynarayana, C.U.
BackLink https://www.ncbi.nlm.nih.gov/pubmed/28865927$$D View this record in MEDLINE/PubMed
BookMark eNp9kLlOAzEQhi0EgnA8AA3akmYXH_GxUKGIS0KiAVFaxjsGRxtvsL0RvD0OAUqqmZG--Ufz7aPtMARA6JjghmAizuaNMcuGYiIbrBpMxRaaECXbmmIut9EEY0zqKZd8D-2nNC-jVJLvoj2qlOAtlRP0PBtCynG02Q-hMtb6DkKugonRZL-CyvYmJe-8NWvivLoMFaxMP36P1eCqDB-5Wvjgw2vp7Vvw7yOkQ7TjTJ_g6KceoKfrq8fZbX3_cHM3u7yvLeMs166lDHBLCWXTDigzBlNumbOEAHOOdcoRbAUzWEnlhOC0a1nHhJhK6bCQ7ACdbnKXcVjfzXrhk4W-NwGGMWnSljuKC6UKSjaojUNKEZxeRr8w8VMTrNc-9VwXn3rtU2Oli8-yc_ITP74soPvb-BVYgIsNAOXJlYeok_UQLHQ-gs26G_w_8V_IgIgC
CitedBy_id crossref_primary_10_1016_j_autcon_2023_105020
crossref_primary_10_3390_aerospace9040178
crossref_primary_10_3390_app11020821
crossref_primary_10_1016_j_ijinfomgt_2022_102495
crossref_primary_10_1016_j_jsr_2019_10_006
crossref_primary_10_1016_j_autcon_2020_103089
crossref_primary_10_1016_j_aap_2023_107011
crossref_primary_10_1016_j_autcon_2021_104059
crossref_primary_10_1016_j_ssci_2023_106157
crossref_primary_10_1016_j_heliyon_2024_e26410
crossref_primary_10_3390_app14020664
crossref_primary_10_1016_j_autcon_2021_103915
crossref_primary_10_1016_j_autcon_2020_103517
crossref_primary_10_1016_j_jsr_2021_12_024
crossref_primary_10_2139_ssrn_4811253
crossref_primary_10_1080_15623599_2022_2159630
crossref_primary_10_3390_app122110765
crossref_primary_10_1016_j_aei_2019_02_009
crossref_primary_10_3390_axioms11100547
crossref_primary_10_1016_j_eswa_2022_117281
crossref_primary_10_1016_j_ssci_2023_106381
crossref_primary_10_1061__ASCE_ME_1943_5479_0000738
crossref_primary_10_1016_j_neucom_2021_01_089
crossref_primary_10_1016_j_techfore_2023_122347
crossref_primary_10_1016_j_autcon_2021_103987
crossref_primary_10_1016_j_aap_2023_107261
crossref_primary_10_1016_j_aei_2020_101152
crossref_primary_10_1002_eng2_12773
crossref_primary_10_1177_1748006X221139906
crossref_primary_10_3390_app12125781
crossref_primary_10_1016_j_heliyon_2022_e12088
crossref_primary_10_3390_app12052482
crossref_primary_10_1111_risa_13651
crossref_primary_10_1061__ASCE_CO_1943_7862_0002354
crossref_primary_10_3390_app131910599
crossref_primary_10_1016_j_aap_2020_105578
crossref_primary_10_1016_j_aap_2021_105973
crossref_primary_10_1016_j_autcon_2022_104670
crossref_primary_10_1016_j_aei_2023_101929
crossref_primary_10_1016_j_jsr_2024_02_006
crossref_primary_10_1016_j_nlp_2023_100007
crossref_primary_10_1016_j_ssci_2020_104900
crossref_primary_10_3390_su142416846
crossref_primary_10_1016_j_autcon_2020_103145
crossref_primary_10_1016_j_ssci_2021_105363
crossref_primary_10_1109_OJITS_2023_3335817
crossref_primary_10_1016_j_autcon_2020_103265
crossref_primary_10_1016_j_autcon_2023_105200
crossref_primary_10_17341_gazimmfd_1131524
crossref_primary_10_1016_j_autcon_2024_105443
crossref_primary_10_3390_knowledge2030021
crossref_primary_10_3390_buildings13051169
crossref_primary_10_1016_j_psep_2021_11_004
crossref_primary_10_1016_j_tust_2022_104616
crossref_primary_10_3390_ijerph191610209
crossref_primary_10_1177_03611981221103229
crossref_primary_10_1016_j_aei_2023_102050
crossref_primary_10_1061_JCEMD4_COENG_14669
crossref_primary_10_3390_en16031196
crossref_primary_10_1055_a_1863_7176
crossref_primary_10_1016_j_eswa_2022_118352
crossref_primary_10_1016_j_aei_2024_102507
crossref_primary_10_1016_j_autcon_2024_105458
crossref_primary_10_1080_13467581_2024_2373818
crossref_primary_10_1007_s13369_023_07964_w
crossref_primary_10_1016_j_autcon_2022_104169
crossref_primary_10_1016_j_ress_2020_107352
crossref_primary_10_1016_j_ssci_2018_12_006
crossref_primary_10_3390_info12110451
crossref_primary_10_1016_j_psep_2020_08_006
crossref_primary_10_1108_ECAM_06_2022_0603
crossref_primary_10_1016_j_ssci_2019_06_034
crossref_primary_10_1177_03611981211003581
crossref_primary_10_1016_j_autcon_2021_103608
crossref_primary_10_1016_j_autcon_2024_105343
crossref_primary_10_1080_15623599_2019_1683692
crossref_primary_10_1061__ASCE_CO_1943_7862_0002308
crossref_primary_10_3390_buildings14061797
crossref_primary_10_1590_0103_6513_20210048
crossref_primary_10_1016_j_ssci_2021_105261
crossref_primary_10_3846_transport_2021_14329
crossref_primary_10_21597_jist_1285239
crossref_primary_10_1016_j_autcon_2018_12_016
crossref_primary_10_1061_JMENEA_MEENG_5485
crossref_primary_10_1007_s00521_021_06780_3
crossref_primary_10_1016_j_autcon_2019_102974
crossref_primary_10_1016_j_ssci_2023_106113
crossref_primary_10_1108_ECAM_04_2021_0303
crossref_primary_10_1016_j_aei_2020_101060
crossref_primary_10_1061_JMENEA_MEENG_5516
crossref_primary_10_1061__ASCE_CO_1943_7862_0002382
crossref_primary_10_1080_19439962_2019_1597795
crossref_primary_10_1109_ACCESS_2023_3304328
crossref_primary_10_1177_1071181320641034
crossref_primary_10_1016_j_aap_2021_106019
crossref_primary_10_1016_j_ssci_2020_104616
crossref_primary_10_3390_ijerph18115573
crossref_primary_10_1016_j_aap_2023_107224
crossref_primary_10_1016_j_autcon_2022_104304
crossref_primary_10_1016_j_aei_2022_101752
crossref_primary_10_3390_app13126983
crossref_primary_10_3389_fpubh_2022_984099
crossref_primary_10_1177_03611981211001385
crossref_primary_10_7717_peerj_cs_1985
crossref_primary_10_1061_JCEMD4_COENG_13523
crossref_primary_10_1016_j_ssci_2021_105528
crossref_primary_10_1080_10803548_2022_2118983
crossref_primary_10_36680_j_itcon_2022_045
crossref_primary_10_1177_03611981221106786
crossref_primary_10_1016_j_tust_2023_105157
crossref_primary_10_1177_20552076231185674
crossref_primary_10_1016_j_autcon_2022_104351
crossref_primary_10_3233_WOR_220533
crossref_primary_10_36680_j_itcon_2023_013
crossref_primary_10_1016_j_tust_2021_103852
crossref_primary_10_3390_su132413579
crossref_primary_10_1016_j_autcon_2024_105522
crossref_primary_10_3390_app10175754
crossref_primary_10_1016_j_ocecoaman_2023_106660
crossref_primary_10_1061_JCEMD4_COENG_12848
crossref_primary_10_1002_cpe_7277
crossref_primary_10_3389_fbuil_2021_690071
crossref_primary_10_1155_2023_4181159
crossref_primary_10_3390_safety5020033
crossref_primary_10_1002_cpe_7437
crossref_primary_10_3390_ijerph21070831
crossref_primary_10_1002_prs_12556
crossref_primary_10_1061_JCEMD4_COENG_14080
crossref_primary_10_1016_j_jobe_2024_109330
crossref_primary_10_1016_j_ssci_2022_106023
crossref_primary_10_1016_j_ssci_2024_106468
crossref_primary_10_1016_j_aei_2021_101355
crossref_primary_10_1108_ECAM_04_2022_0305
crossref_primary_10_1016_j_autcon_2021_103896
crossref_primary_10_1108_ECAM_09_2021_0797
crossref_primary_10_1007_s11831_023_09938_5
crossref_primary_10_3390_app14041352
crossref_primary_10_1016_j_psep_2021_05_036
crossref_primary_10_1061_JCEMD4_COENG_13023
crossref_primary_10_1016_j_ssci_2020_105130
crossref_primary_10_1061_JCEMD4_COENG_14114
crossref_primary_10_1061_JCEMD4_COENG_14515
crossref_primary_10_1016_j_wpi_2023_102259
crossref_primary_10_1061_JCEMD4_COENG_13549
crossref_primary_10_1016_j_procs_2023_10_515
Cites_doi 10.1016/j.autcon.2012.11.037
10.1186/s12911-021-01695-4
10.1016/j.ssci.2014.10.006
10.1016/j.jsr.2012.10.012
10.1186/1472-6947-15-S1-S5
10.1016/j.aap.2013.09.012
10.1016/j.autcon.2016.01.001
10.1186/1472-6947-10-19
10.1016/j.aap.2009.09.020
10.1136/ip.2010.030593
10.1061/(ASCE)0733-9364(2004)130:4(542)
10.1136/injuryprev-2015-041813
10.1016/j.autcon.2015.11.001
10.1016/j.autcon.2014.02.014
10.1023/A:1009715923555
10.1016/j.autcon.2012.10.014
10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO;2-L
10.1016/j.aap.2015.03.018
10.1016/j.compind.2015.09.005
10.1002/cpe.3040
10.1145/944012.944013
10.1145/505282.505283
ContentType Journal Article
Copyright 2017 Elsevier Ltd
Copyright © 2017 Elsevier Ltd. All rights reserved.
Copyright_xml – notice: 2017 Elsevier Ltd
– notice: Copyright © 2017 Elsevier Ltd. All rights reserved.
DBID CGR
CUY
CVF
ECM
EIF
NPM
AAYXX
CITATION
7X8
DOI 10.1016/j.aap.2017.08.026
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
CrossRef
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
CrossRef
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic

MEDLINE
Database_xml – sequence: 1
  dbid: ECM
  name: MEDLINE
  url: https://search.ebscohost.com/login.aspx?direct=true&db=cmedm&site=ehost-live
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Social Welfare & Social Work
Public Health
EISSN 1879-2057
EndPage 130
ExternalDocumentID 10_1016_j_aap_2017_08_026
28865927
S0001457517303068
Genre Validation Studies
Evaluation Studies
Journal Article
GroupedDBID ---
--K
--M
-~X
..I
.~1
0R~
1B1
1RT
1~.
23M
4.4
457
4G.
53G
5GY
5RE
5VS
7-5
71M
8P~
9JM
9JN
9JO
AABNK
AACTN
AAEDT
AAEDW
AAFJI
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
ABBQC
ABDMP
ABFNM
ABIVO
ABJNI
ABLVK
ABMAC
ABMMH
ABMZM
ABNUV
ABXDB
ABYKQ
ACDAQ
ACGFS
ACHQT
ACNCT
ACNNM
ACRLP
ADBBV
ADEWK
ADEZE
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHPOS
AHRSL
AI.
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
AJRQY
AKURH
AKYCK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
ANZVX
AOMHK
ASPBG
AVARZ
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
BNPGV
CS3
EBS
EFJIC
EFLBG
EJD
ENUVR
EO8
EO9
EP2
EP3
F3I
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
HEH
HMK
HMO
HMY
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LCYCR
M29
M3W
M3Y
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PRBVW
Q38
R2-
RIG
ROL
RPZ
SAE
SCC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSB
SSG
SSH
SSO
SSS
SST
SSZ
T5K
VH1
WUQ
XPP
ZCG
ZGI
~G-
AAXKI
AFJKZ
AKRWK
CGR
CUY
CVF
ECM
EIF
NPM
AAYXX
CITATION
7X8
ID FETCH-LOGICAL-c353t-f923e0921234de23aa025c3fc11e3ff3d8f10c63a0878f6652d93d366477f0673
ISSN 0001-4575
IngestDate Fri Oct 25 00:11:48 EDT 2024
Thu Sep 26 17:43:06 EDT 2024
Sat Sep 28 08:50:34 EDT 2024
Fri Feb 23 02:33:12 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Text mining
Construction safety
Support vector machine
Data mining
Accident classification
Language English
License Copyright © 2017 Elsevier Ltd. All rights reserved.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c353t-f923e0921234de23aa025c3fc11e3ff3d8f10c63a0878f6652d93d366477f0673
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 28865927
PQID 1935385688
PQPubID 23479
PageCount 9
ParticipantIDs proquest_miscellaneous_1935385688
crossref_primary_10_1016_j_aap_2017_08_026
pubmed_primary_28865927
elsevier_sciencedirect_doi_10_1016_j_aap_2017_08_026
PublicationCentury 2000
PublicationDate November 2017
2017-Nov
2017-11-00
20171101
PublicationDateYYYYMMDD 2017-11-01
PublicationDate_xml – month: 11
  year: 2017
  text: November 2017
PublicationDecade 2010
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Accident analysis and prevention
PublicationTitleAlternate Accid Anal Prev
PublicationYear 2017
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Burges (bib0030) 1998; 2
Leximancer Pty Ltd (bib0070) 2016
Chen, Vallmuur, Nayak (bib0035) 2015; 15
Workplace Safety and Health Institute (bib0170) 2016
Chi, Lin, El-Gohary, Hsieh (bib0045) 2016; 64
Blum, Mitchell (bib0020) 1998
scikit-learn Community (bib0185) 2016
Tanguy, Tulechki, Urieli, Hermann, Raynal (bib0125) 2015; 78
Yu, Hsu (bib0175) 2013; 31
Chua, Goh (bib0050) 2004; 130
Fan, Li (bib0055) 2013; 34
Zhou, Goh, Li (bib0180) 2015; 72
Reason (bib0110) 1997
Keikha, Razavian, Oroumchian, Razi (bib0065) 2008
Turney, Littman (bib0140) 2003; 21
McKenzie, Campbell, Scott, Discoll, Harrison, McClure (bib0080) 2010; 10
Goh (bib0060) 2016
Bishop (bib0015) 2006
Vallmuur, Marucci-Wellman, Taylor, Lehto, Corns, Smith (bib0145) 2016; 22
Sebastiani (bib0115) 2002; 34
Bertke, Meyers, Wurzelbacher, Bell, Lampl, Robins (bib0005) 2012; 43
Tixier, Hallowell, Rajagopalan, Bowman (bib0135) 2016; 62
Williams, Gong (bib0155) 2014; 43
Buckland, Gey (bib0025) 1994; 45
Chen, Vallmuur, Nayak (bib0040) 2015; 15
McKenzie, Scott, Campbell, McClure (bib0085) 2010; 42
Vallmuur (bib0150) 2015; 79
Taylor, Lacovara, Smith, Pandian, Lehto (bib0130) 2014; 62
Raschka (bib0105) 2015
Witten (bib0165) 2011
Marucci-Wellman, Lehto, Corns (bib0075) 2011; 17
Williams (bib0160) 2011
Peng, Liu, Zuo (bib0095) 2014; 26
Shibukawa (bib0120) 2013
Bird, Klein, Loper (bib0010) 2009
Occupational Safety and Health Administration (bib0090) 2016
Python Software Foundation (bib0100) 2016
Witten (10.1016/j.aap.2017.08.026_bib0165) 2011
scikit-learn Community (10.1016/j.aap.2017.08.026_bib0185) 2016
Buckland (10.1016/j.aap.2017.08.026_bib0025) 1994; 45
McKenzie (10.1016/j.aap.2017.08.026_bib0080) 2010; 10
Peng (10.1016/j.aap.2017.08.026_bib0095) 2014; 26
Vallmuur (10.1016/j.aap.2017.08.026_bib0150) 2015; 79
Marucci-Wellman (10.1016/j.aap.2017.08.026_bib0075) 2011; 17
McKenzie (10.1016/j.aap.2017.08.026_bib0085) 2010; 42
Chua (10.1016/j.aap.2017.08.026_bib0050) 2004; 130
Tixier (10.1016/j.aap.2017.08.026_bib0135) 2016; 62
Taylor (10.1016/j.aap.2017.08.026_bib0130) 2014; 62
Workplace Safety and Health Institute (10.1016/j.aap.2017.08.026_bib0170) 2016
Goh (10.1016/j.aap.2017.08.026_bib0060) 2016
Sebastiani (10.1016/j.aap.2017.08.026_bib0115) 2002; 34
Williams (10.1016/j.aap.2017.08.026_bib0160) 2011
Occupational Safety and Health Administration (10.1016/j.aap.2017.08.026_bib0090) 2016
Zhou (10.1016/j.aap.2017.08.026_bib0180) 2015; 72
Chen (10.1016/j.aap.2017.08.026_bib0040) 2015; 15
Leximancer Pty Ltd (10.1016/j.aap.2017.08.026_bib0070) 2016
Keikha (10.1016/j.aap.2017.08.026_bib0065) 2008
Turney (10.1016/j.aap.2017.08.026_bib0140) 2003; 21
Chi (10.1016/j.aap.2017.08.026_bib0045) 2016; 64
Raschka (10.1016/j.aap.2017.08.026_bib0105) 2015
Bishop (10.1016/j.aap.2017.08.026_bib0015) 2006
Williams (10.1016/j.aap.2017.08.026_bib0155) 2014; 43
Bird (10.1016/j.aap.2017.08.026_bib0010) 2009
Fan (10.1016/j.aap.2017.08.026_bib0055) 2013; 34
Shibukawa (10.1016/j.aap.2017.08.026_bib0120) 2013
Yu (10.1016/j.aap.2017.08.026_bib0175) 2013; 31
Tanguy (10.1016/j.aap.2017.08.026_bib0125) 2015; 78
Blum (10.1016/j.aap.2017.08.026_bib0020) 1998
Burges (10.1016/j.aap.2017.08.026_bib0030) 1998; 2
Python Software Foundation (10.1016/j.aap.2017.08.026_bib0100) 2016
Vallmuur (10.1016/j.aap.2017.08.026_bib0145) 2016; 22
Bertke (10.1016/j.aap.2017.08.026_bib0005) 2012; 43
Chen (10.1016/j.aap.2017.08.026_bib0035) 2015; 15
Reason (10.1016/j.aap.2017.08.026_bib0110) 1997
References_xml – volume: 2
  start-page: 121
  year: 1998
  end-page: 167
  ident: bib0030
  article-title: A tutorial on support vector machines for pattern recognition
  publication-title: Data Min. Knowl. Discovery
  contributor:
    fullname: Burges
– year: 2011
  ident: bib0160
  article-title: Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!)
  contributor:
    fullname: Williams
– volume: 79
  start-page: 41
  year: 2015
  end-page: 49
  ident: bib0150
  article-title: Machine learning approaches to analysing textual injury surveillance data: a systematic review
  publication-title: Accid. Anal. Prev.
  contributor:
    fullname: Vallmuur
– year: 2016
  ident: bib0100
  article-title: Python Language Reference, Version 2.7
  contributor:
    fullname: Python Software Foundation
– year: 2006
  ident: bib0015
  article-title: Pattern Recognition and Machine Learning
  contributor:
    fullname: Bishop
– year: 2015
  ident: bib0105
  article-title: Python Machine Learning
  contributor:
    fullname: Raschka
– volume: 45
  start-page: 12
  year: 1994
  ident: bib0025
  article-title: The relationship between recall and precision
  publication-title: J. Am. Soc. Inf. Sci. (1986–1998)
  contributor:
    fullname: Gey
– volume: 34
  start-page: 85
  year: 2013
  end-page: 91
  ident: bib0055
  article-title: Retrieving similar cases for alternative dispute resolution in construction accidents using text mining techniques
  publication-title: Autom. Constr.
  contributor:
    fullname: Li
– volume: 130
  start-page: 542
  year: 2004
  end-page: 551
  ident: bib0050
  article-title: Incident causation model for improving feedback of safety knowledge
  publication-title: J. Constr. Eng. Manage. – Am. Soc. Civ. Eng.
  contributor:
    fullname: Goh
– volume: 31
  start-page: 65
  year: 2013
  end-page: 74
  ident: bib0175
  article-title: Content-based text mining technique for retrieval of CAD documents
  publication-title: Autom. Constr.
  contributor:
    fullname: Hsu
– year: 2016
  ident: bib0060
  article-title: Accident Narratives Dataset Obtained from Occupational Safety and Health Administration (OSHA) Fatality and Catastrophe Investigation Summaries
  contributor:
    fullname: Goh
– volume: 34
  start-page: 1
  year: 2002
  end-page: 47
  ident: bib0115
  article-title: Machine learning in automated text categorization
  publication-title: ACM Comput. Surv.
  contributor:
    fullname: Sebastiani
– volume: 15
  start-page: S5
  year: 2015
  ident: bib0040
  article-title: Injury narrative text classification using factorization model
  publication-title: BMC Med. Inf. Decis. Making
  contributor:
    fullname: Nayak
– volume: 15
  start-page: 1
  year: 2015
  end-page: 12
  ident: bib0035
  article-title: Injury narrative text classification using factorization model
  publication-title: BMC Med. Inform. Decis. Mak.
  contributor:
    fullname: Nayak
– year: 2016
  ident: bib0070
  article-title: Leximancer
  contributor:
    fullname: Leximancer Pty Ltd
– year: 2013
  ident: bib0120
  article-title: Snowball Stemming Library Collection for Python
  contributor:
    fullname: Shibukawa
– volume: 21
  start-page: 315
  year: 2003
  end-page: 346
  ident: bib0140
  article-title: Measuring praise and criticism: inference of semantic orientation from association
  publication-title: ACM Trans. Inf. Syst.
  contributor:
    fullname: Littman
– volume: 43
  start-page: 23
  year: 2014
  end-page: 29
  ident: bib0155
  article-title: Predicting construction cost overruns using text mining: numerical data and ensemble classifiers
  publication-title: Autom. Constr.
  contributor:
    fullname: Gong
– volume: 22
  start-page: i34
  year: 2016
  end-page: i42
  ident: bib0145
  article-title: Harnessing information from injury narratives in the ‘big data’ era: understanding and applying machine learning for injury surveillance
  publication-title: Inj. Prev.
  contributor:
    fullname: Smith
– year: 2009
  ident: bib0010
  article-title: Natural Language Processing with Python
  contributor:
    fullname: Loper
– volume: 26
  start-page: 728
  year: 2014
  end-page: 741
  ident: bib0095
  article-title: PU text classification enhanced by term frequency–inverse document frequency-improved weighting
  publication-title: Concurrency Comput. Pract Experience
  contributor:
    fullname: Zuo
– volume: 72
  start-page: 337
  year: 2015
  end-page: 350
  ident: bib0180
  article-title: Overview and analysis of safety management studies in the construction industry
  publication-title: Saf. Sci.
  contributor:
    fullname: Li
– volume: 62
  start-page: 119
  year: 2014
  end-page: 129
  ident: bib0130
  article-title: Near-miss narratives from the fire service: a Bayesian analysis
  publication-title: Accid. Anal. Prev.
  contributor:
    fullname: Lehto
– volume: 62
  start-page: 45
  year: 2016
  end-page: 56
  ident: bib0135
  article-title: Automated content analysis for construction safety: a natural language processing system to extract precursors and outcomes from unstructured injury reports
  publication-title: Autom. Constr.
  contributor:
    fullname: Bowman
– year: 1997
  ident: bib0110
  article-title: Managing the Risks of Organizational Accidents
  contributor:
    fullname: Reason
– year: 2011
  ident: bib0165
  article-title: Data Mining: Practical Machine Learning Tools and Techniques
  contributor:
    fullname: Witten
– volume: 43
  year: 2012
  ident: bib0005
  article-title: Development and evaluation of a naive bayesian model for coding causation of workers compensation claims
  publication-title: J. Safety Res.
  contributor:
    fullname: Robins
– volume: 64
  start-page: 78
  year: 2016
  end-page: 88
  ident: bib0045
  article-title: Evaluating the strength of text classification categories for supporting construction field inspection
  publication-title: Autom. Constr.
  contributor:
    fullname: Hsieh
– volume: 17
  start-page: 407
  year: 2011
  end-page: 414
  ident: bib0075
  article-title: A combined Fuzzy and Naïve Bayesian strategy can be used to assign event codes to injury narratives
  publication-title: Inj. Prev.
  contributor:
    fullname: Corns
– year: 2016
  ident: bib0090
  article-title: Fatality and Catastrophe Investigation Summaries
  contributor:
    fullname: Occupational Safety and Health Administration
– volume: 78
  start-page: 80
  year: 2015
  end-page: 95
  ident: bib0125
  article-title: Natural language processing for aviation safety reports: from classification to interactive analysis
  publication-title: Comput. Ind.
  contributor:
    fullname: Raynal
– start-page: 92
  year: 1998
  end-page: 100
  ident: bib0020
  article-title: Combining labeled and unlabeled data with co-training
  publication-title: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, ACM
  contributor:
    fullname: Mitchell
– start-page: 219
  year: 2008
  end-page: 232
  ident: bib0065
  article-title: Document representation and quality of text: an analysis
  publication-title: Survey of Text Mining II: Clustering, Classification, and Retrieval
  contributor:
    fullname: Razi
– volume: 42
  start-page: 354
  year: 2010
  end-page: 363
  ident: bib0085
  article-title: The use of narrative text for injury surveillance research: a systematic review
  publication-title: Accid. Anal. Prev.
  contributor:
    fullname: McClure
– volume: 10
  start-page: 1
  year: 2010
  end-page: 10
  ident: bib0080
  article-title: Identifying work related injuries: comparison of methods for interrogating text fields
  publication-title: BMC Med. Inform. Decis. Mak.
  contributor:
    fullname: McClure
– year: 2016
  ident: bib0185
  article-title: Scikit-learn – Machine Learning in Python
  contributor:
    fullname: scikit-learn Community
– year: 2016
  ident: bib0170
  article-title: Workplace Safety and Health Report 2015
  contributor:
    fullname: Workplace Safety and Health Institute
– volume: 31
  start-page: 65
  year: 2013
  ident: 10.1016/j.aap.2017.08.026_bib0175
  article-title: Content-based text mining technique for retrieval of CAD documents
  publication-title: Autom. Constr.
  doi: 10.1016/j.autcon.2012.11.037
  contributor:
    fullname: Yu
– volume: 15
  start-page: 1
  issue: 1
  year: 2015
  ident: 10.1016/j.aap.2017.08.026_bib0035
  article-title: Injury narrative text classification using factorization model
  publication-title: BMC Med. Inform. Decis. Mak.
  doi: 10.1186/s12911-021-01695-4
  contributor:
    fullname: Chen
– volume: 72
  start-page: 337
  issue: February
  year: 2015
  ident: 10.1016/j.aap.2017.08.026_bib0180
  article-title: Overview and analysis of safety management studies in the construction industry
  publication-title: Saf. Sci.
  doi: 10.1016/j.ssci.2014.10.006
  contributor:
    fullname: Zhou
– volume: 43
  year: 2012
  ident: 10.1016/j.aap.2017.08.026_bib0005
  article-title: Development and evaluation of a naive bayesian model for coding causation of workers compensation claims
  publication-title: J. Safety Res.
  doi: 10.1016/j.jsr.2012.10.012
  contributor:
    fullname: Bertke
– year: 2006
  ident: 10.1016/j.aap.2017.08.026_bib0015
  contributor:
    fullname: Bishop
– start-page: 92
  year: 1998
  ident: 10.1016/j.aap.2017.08.026_bib0020
  article-title: Combining labeled and unlabeled data with co-training
  contributor:
    fullname: Blum
– volume: 15
  start-page: S5
  issue: 1
  year: 2015
  ident: 10.1016/j.aap.2017.08.026_bib0040
  article-title: Injury narrative text classification using factorization model
  publication-title: BMC Med. Inf. Decis. Making
  doi: 10.1186/1472-6947-15-S1-S5
  contributor:
    fullname: Chen
– volume: 62
  start-page: 119
  year: 2014
  ident: 10.1016/j.aap.2017.08.026_bib0130
  article-title: Near-miss narratives from the fire service: a Bayesian analysis
  publication-title: Accid. Anal. Prev.
  doi: 10.1016/j.aap.2013.09.012
  contributor:
    fullname: Taylor
– year: 2011
  ident: 10.1016/j.aap.2017.08.026_bib0165
  contributor:
    fullname: Witten
– volume: 64
  start-page: 78
  year: 2016
  ident: 10.1016/j.aap.2017.08.026_bib0045
  article-title: Evaluating the strength of text classification categories for supporting construction field inspection
  publication-title: Autom. Constr.
  doi: 10.1016/j.autcon.2016.01.001
  contributor:
    fullname: Chi
– volume: 10
  start-page: 1
  issue: 1
  year: 2010
  ident: 10.1016/j.aap.2017.08.026_bib0080
  article-title: Identifying work related injuries: comparison of methods for interrogating text fields
  publication-title: BMC Med. Inform. Decis. Mak.
  doi: 10.1186/1472-6947-10-19
  contributor:
    fullname: McKenzie
– volume: 42
  start-page: 354
  issue: 2
  year: 2010
  ident: 10.1016/j.aap.2017.08.026_bib0085
  article-title: The use of narrative text for injury surveillance research: a systematic review
  publication-title: Accid. Anal. Prev.
  doi: 10.1016/j.aap.2009.09.020
  contributor:
    fullname: McKenzie
– start-page: 219
  year: 2008
  ident: 10.1016/j.aap.2017.08.026_bib0065
  article-title: Document representation and quality of text: an analysis
  contributor:
    fullname: Keikha
– volume: 17
  start-page: 407
  issue: 6
  year: 2011
  ident: 10.1016/j.aap.2017.08.026_bib0075
  article-title: A combined Fuzzy and Naïve Bayesian strategy can be used to assign event codes to injury narratives
  publication-title: Inj. Prev.
  doi: 10.1136/ip.2010.030593
  contributor:
    fullname: Marucci-Wellman
– year: 2013
  ident: 10.1016/j.aap.2017.08.026_bib0120
  contributor:
    fullname: Shibukawa
– volume: 130
  start-page: 542
  issue: 4
  year: 2004
  ident: 10.1016/j.aap.2017.08.026_bib0050
  article-title: Incident causation model for improving feedback of safety knowledge
  publication-title: J. Constr. Eng. Manage. – Am. Soc. Civ. Eng.
  doi: 10.1061/(ASCE)0733-9364(2004)130:4(542)
  contributor:
    fullname: Chua
– year: 1997
  ident: 10.1016/j.aap.2017.08.026_bib0110
  contributor:
    fullname: Reason
– volume: 22
  start-page: i34
  issue: Suppl 1
  year: 2016
  ident: 10.1016/j.aap.2017.08.026_bib0145
  article-title: Harnessing information from injury narratives in the ‘big data’ era: understanding and applying machine learning for injury surveillance
  publication-title: Inj. Prev.
  doi: 10.1136/injuryprev-2015-041813
  contributor:
    fullname: Vallmuur
– year: 2016
  ident: 10.1016/j.aap.2017.08.026_bib0090
  contributor:
    fullname: Occupational Safety and Health Administration
– year: 2015
  ident: 10.1016/j.aap.2017.08.026_bib0105
  contributor:
    fullname: Raschka
– volume: 62
  start-page: 45
  year: 2016
  ident: 10.1016/j.aap.2017.08.026_bib0135
  article-title: Automated content analysis for construction safety: a natural language processing system to extract precursors and outcomes from unstructured injury reports
  publication-title: Autom. Constr.
  doi: 10.1016/j.autcon.2015.11.001
  contributor:
    fullname: Tixier
– volume: 43
  start-page: 23
  year: 2014
  ident: 10.1016/j.aap.2017.08.026_bib0155
  article-title: Predicting construction cost overruns using text mining: numerical data and ensemble classifiers
  publication-title: Autom. Constr.
  doi: 10.1016/j.autcon.2014.02.014
  contributor:
    fullname: Williams
– year: 2016
  ident: 10.1016/j.aap.2017.08.026_bib0100
  contributor:
    fullname: Python Software Foundation
– year: 2016
  ident: 10.1016/j.aap.2017.08.026_bib0185
  contributor:
    fullname: scikit-learn Community
– volume: 2
  start-page: 121
  issue: 2
  year: 1998
  ident: 10.1016/j.aap.2017.08.026_bib0030
  article-title: A tutorial on support vector machines for pattern recognition
  publication-title: Data Min. Knowl. Discovery
  doi: 10.1023/A:1009715923555
  contributor:
    fullname: Burges
– volume: 34
  start-page: 85
  year: 2013
  ident: 10.1016/j.aap.2017.08.026_bib0055
  article-title: Retrieving similar cases for alternative dispute resolution in construction accidents using text mining techniques
  publication-title: Autom. Constr.
  doi: 10.1016/j.autcon.2012.10.014
  contributor:
    fullname: Fan
– year: 2011
  ident: 10.1016/j.aap.2017.08.026_bib0160
  contributor:
    fullname: Williams
– volume: 45
  start-page: 12
  issue: 1
  year: 1994
  ident: 10.1016/j.aap.2017.08.026_bib0025
  article-title: The relationship between recall and precision
  publication-title: J. Am. Soc. Inf. Sci. (1986–1998)
  doi: 10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO;2-L
  contributor:
    fullname: Buckland
– volume: 79
  start-page: 41
  year: 2015
  ident: 10.1016/j.aap.2017.08.026_bib0150
  article-title: Machine learning approaches to analysing textual injury surveillance data: a systematic review
  publication-title: Accid. Anal. Prev.
  doi: 10.1016/j.aap.2015.03.018
  contributor:
    fullname: Vallmuur
– year: 2016
  ident: 10.1016/j.aap.2017.08.026_bib0060
  contributor:
    fullname: Goh
– year: 2009
  ident: 10.1016/j.aap.2017.08.026_bib0010
  contributor:
    fullname: Bird
– year: 2016
  ident: 10.1016/j.aap.2017.08.026_bib0170
  contributor:
    fullname: Workplace Safety and Health Institute
– volume: 78
  start-page: 80
  year: 2015
  ident: 10.1016/j.aap.2017.08.026_bib0125
  article-title: Natural language processing for aviation safety reports: from classification to interactive analysis
  publication-title: Comput. Ind.
  doi: 10.1016/j.compind.2015.09.005
  contributor:
    fullname: Tanguy
– year: 2016
  ident: 10.1016/j.aap.2017.08.026_bib0070
  contributor:
    fullname: Leximancer Pty Ltd
– volume: 26
  start-page: 728
  issue: 3
  year: 2014
  ident: 10.1016/j.aap.2017.08.026_bib0095
  article-title: PU text classification enhanced by term frequency–inverse document frequency-improved weighting
  publication-title: Concurrency Comput. Pract Experience
  doi: 10.1002/cpe.3040
  contributor:
    fullname: Peng
– volume: 21
  start-page: 315
  issue: 4
  year: 2003
  ident: 10.1016/j.aap.2017.08.026_bib0140
  article-title: Measuring praise and criticism: inference of semantic orientation from association
  publication-title: ACM Trans. Inf. Syst.
  doi: 10.1145/944012.944013
  contributor:
    fullname: Turney
– volume: 34
  start-page: 1
  issue: 1
  year: 2002
  ident: 10.1016/j.aap.2017.08.026_bib0115
  article-title: Machine learning in automated text categorization
  publication-title: ACM Comput. Surv.
  doi: 10.1145/505282.505283
  contributor:
    fullname: Sebastiani
SSID ssj0007875
Score 2.6123772
Snippet •Evaluated six machine learning algorithms for classifying accident narratives in accordance to 11 accident types.•Found that support vector machine (SVM)...
Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators....
SourceID proquest
crossref
pubmed
elsevier
SourceType Aggregation Database
Index Database
Publisher
StartPage 122
SubjectTerms Accident classification
Accidents
Accidents, Occupational - classification
Algorithms
Bayes Theorem
Construction Industry
Construction safety
Data mining
Data Mining - methods
Databases, Factual
Decision Trees
Humans
Linear Models
Machine Learning - standards
Narration
Reproducibility of Results
Safety
Support Vector Machine
Text mining
Title Construction accident narrative classification: An evaluation of text mining techniques
URI https://dx.doi.org/10.1016/j.aap.2017.08.026
https://www.ncbi.nlm.nih.gov/pubmed/28865927
https://search.proquest.com/docview/1935385688
Volume 108
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3da9UwFA9uexFk6PyaTokgPnjppUnaNPWtzCtT0JftsvlU0nwMprbj3u1h_70nSdN2k4kKvpSSe5uG_n6cnHNyPhB6zRpBFc-zRJdNk2TE6qSxhCTUUJ0Lnhbad2s4OCy-nIj3i2wxOvTHsf-KNIwB1i5z9i_QHiaFAbgHzOEKqMP1j3B3HThjTdiZVMo1Db2YtXLVl_hWTl128UFDWEfVTmp--5gBENizH75zxGyo8bqeqrFVnFfGmiZ9vYE-enKI6um81-arhJk-AxFP4w_LxlzBmuSVDDlp-_PlfOqAgE2NXHNADJkxYxhSkLRgm-ahK8rcBOEqihKwCgWpB-mbion8JCFJud-KSTiy-UXKB4fD2VxKV3GUFL4IK71RUdvv0YfeBnRnSyDJwDoSG2iLgkgCibhVfVycfBp2bRBcodtFv-54Au5jAW-86DYd5jYbxesqR_fRdm9k4Cqw4wG6Y9oddC94aHFIPNtBeyEzGx-b71auDH6D40C3-vYQHU-JhCOR8EAkfJ1I73DV4pFGuLPY0QgHGuGRRo_Q8sPiaP8g6btwJIrl7CKxYAKYtHQqTqYNZVKCmqyYVYQYZi3TwpJUcSZTUQjLeU51yTTjLsPZujZIj9Fm27XmKcJNaqSBWTktwTBtqOBGW6IsqKSsKAXfRW_jd63PQ7GVOkYhntUAQu1AqF3fVAp_zuKXr3ttMWiBNdDkd4-9iijVIEnd8ZhsTXe5rsGUgd0_50LsoicBvmEVVAgXf1A8-7eXPkd3_ZmAD4PaQ5uAnnmBNtb68mXPw58cfZ9F
link.rule.ids 315,782,786,27933,27934
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Construction+accident+narrative+classification%3A+An+evaluation+of+text+mining+techniques&rft.jtitle=Accident+analysis+and+prevention&rft.au=Goh%2C+Yang+Miang&rft.au=Ubeynarayana%2C+C.U.&rft.date=2017-11-01&rft.pub=Elsevier+Ltd&rft.issn=0001-4575&rft.eissn=1879-2057&rft.volume=108&rft.spage=122&rft.epage=130&rft_id=info:doi/10.1016%2Fj.aap.2017.08.026&rft.externalDocID=S0001457517303068
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0001-4575&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0001-4575&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0001-4575&client=summon