Construction accident narrative classification: An evaluation of text mining techniques
•Evaluated six machine learning algorithms for classifying accident narratives in accordance to 11 accident types.•Found that support vector machine (SVM) produced the best performance.•Across the 11 accident types, the average precision of the SVM was 0.73, average recall was 0.63, and average F1 s...
Saved in:
Published in: | Accident analysis and prevention Vol. 108; pp. 122 - 130 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
England
Elsevier Ltd
01-11-2017
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | •Evaluated six machine learning algorithms for classifying accident narratives in accordance to 11 accident types.•Found that support vector machine (SVM) produced the best performance.•Across the 11 accident types, the average precision of the SVM was 0.73, average recall was 0.63, and average F1 score was 0.67.•Commonly mislabeled cases were evaluated using confusion matrix, and qualitative evaluation of mislabeled cases.•A set of 1,000 labelled accident narratives and more than 3,000 unlabeled narratives were made available publicly.
Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. |
---|---|
AbstractList | Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. •Evaluated six machine learning algorithms for classifying accident narratives in accordance to 11 accident types.•Found that support vector machine (SVM) produced the best performance.•Across the 11 accident types, the average precision of the SVM was 0.73, average recall was 0.63, and average F1 score was 0.67.•Commonly mislabeled cases were evaluated using confusion matrix, and qualitative evaluation of mislabeled cases.•A set of 1,000 labelled accident narratives and more than 3,000 unlabeled narratives were made available publicly. Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. |
Author | Ubeynarayana, C.U. Goh, Yang Miang |
Author_xml | – sequence: 1 givenname: Yang Miang surname: Goh fullname: Goh, Yang Miang email: bdggym@nus.edu.sg – sequence: 2 givenname: C.U. surname: Ubeynarayana fullname: Ubeynarayana, C.U. |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/28865927$$D View this record in MEDLINE/PubMed |
BookMark | eNp9kLlOAzEQhi0EgnA8AA3akmYXH_GxUKGIS0KiAVFaxjsGRxtvsL0RvD0OAUqqmZG--Ufz7aPtMARA6JjghmAizuaNMcuGYiIbrBpMxRaaECXbmmIut9EEY0zqKZd8D-2nNC-jVJLvoj2qlOAtlRP0PBtCynG02Q-hMtb6DkKugonRZL-CyvYmJe-8NWvivLoMFaxMP36P1eCqDB-5Wvjgw2vp7Vvw7yOkQ7TjTJ_g6KceoKfrq8fZbX3_cHM3u7yvLeMs166lDHBLCWXTDigzBlNumbOEAHOOdcoRbAUzWEnlhOC0a1nHhJhK6bCQ7ACdbnKXcVjfzXrhk4W-NwGGMWnSljuKC6UKSjaojUNKEZxeRr8w8VMTrNc-9VwXn3rtU2Oli8-yc_ITP74soPvb-BVYgIsNAOXJlYeok_UQLHQ-gs26G_w_8V_IgIgC |
CitedBy_id | crossref_primary_10_1016_j_autcon_2023_105020 crossref_primary_10_3390_aerospace9040178 crossref_primary_10_3390_app11020821 crossref_primary_10_1016_j_ijinfomgt_2022_102495 crossref_primary_10_1016_j_jsr_2019_10_006 crossref_primary_10_1016_j_autcon_2020_103089 crossref_primary_10_1016_j_aap_2023_107011 crossref_primary_10_1016_j_autcon_2021_104059 crossref_primary_10_1016_j_ssci_2023_106157 crossref_primary_10_1016_j_heliyon_2024_e26410 crossref_primary_10_3390_app14020664 crossref_primary_10_1016_j_autcon_2021_103915 crossref_primary_10_1016_j_autcon_2020_103517 crossref_primary_10_1016_j_jsr_2021_12_024 crossref_primary_10_2139_ssrn_4811253 crossref_primary_10_1080_15623599_2022_2159630 crossref_primary_10_3390_app122110765 crossref_primary_10_1016_j_aei_2019_02_009 crossref_primary_10_3390_axioms11100547 crossref_primary_10_1016_j_eswa_2022_117281 crossref_primary_10_1016_j_ssci_2023_106381 crossref_primary_10_1061__ASCE_ME_1943_5479_0000738 crossref_primary_10_1016_j_neucom_2021_01_089 crossref_primary_10_1016_j_techfore_2023_122347 crossref_primary_10_1016_j_autcon_2021_103987 crossref_primary_10_1016_j_aap_2023_107261 crossref_primary_10_1016_j_aei_2020_101152 crossref_primary_10_1002_eng2_12773 crossref_primary_10_1177_1748006X221139906 crossref_primary_10_3390_app12125781 crossref_primary_10_1016_j_heliyon_2022_e12088 crossref_primary_10_3390_app12052482 crossref_primary_10_1111_risa_13651 crossref_primary_10_1061__ASCE_CO_1943_7862_0002354 crossref_primary_10_3390_app131910599 crossref_primary_10_1016_j_aap_2020_105578 crossref_primary_10_1016_j_aap_2021_105973 crossref_primary_10_1016_j_autcon_2022_104670 crossref_primary_10_1016_j_aei_2023_101929 crossref_primary_10_1016_j_jsr_2024_02_006 crossref_primary_10_1016_j_nlp_2023_100007 crossref_primary_10_1016_j_ssci_2020_104900 crossref_primary_10_3390_su142416846 crossref_primary_10_1016_j_autcon_2020_103145 crossref_primary_10_1016_j_ssci_2021_105363 crossref_primary_10_1109_OJITS_2023_3335817 crossref_primary_10_1016_j_autcon_2020_103265 crossref_primary_10_1016_j_autcon_2023_105200 crossref_primary_10_17341_gazimmfd_1131524 crossref_primary_10_1016_j_autcon_2024_105443 crossref_primary_10_3390_knowledge2030021 crossref_primary_10_3390_buildings13051169 crossref_primary_10_1016_j_psep_2021_11_004 crossref_primary_10_1016_j_tust_2022_104616 crossref_primary_10_3390_ijerph191610209 crossref_primary_10_1177_03611981221103229 crossref_primary_10_1016_j_aei_2023_102050 crossref_primary_10_1061_JCEMD4_COENG_14669 crossref_primary_10_3390_en16031196 crossref_primary_10_1055_a_1863_7176 crossref_primary_10_1016_j_eswa_2022_118352 crossref_primary_10_1016_j_aei_2024_102507 crossref_primary_10_1016_j_autcon_2024_105458 crossref_primary_10_1080_13467581_2024_2373818 crossref_primary_10_1007_s13369_023_07964_w crossref_primary_10_1016_j_autcon_2022_104169 crossref_primary_10_1016_j_ress_2020_107352 crossref_primary_10_1016_j_ssci_2018_12_006 crossref_primary_10_3390_info12110451 crossref_primary_10_1016_j_psep_2020_08_006 crossref_primary_10_1108_ECAM_06_2022_0603 crossref_primary_10_1016_j_ssci_2019_06_034 crossref_primary_10_1177_03611981211003581 crossref_primary_10_1016_j_autcon_2021_103608 crossref_primary_10_1016_j_autcon_2024_105343 crossref_primary_10_1080_15623599_2019_1683692 crossref_primary_10_1061__ASCE_CO_1943_7862_0002308 crossref_primary_10_3390_buildings14061797 crossref_primary_10_1590_0103_6513_20210048 crossref_primary_10_1016_j_ssci_2021_105261 crossref_primary_10_3846_transport_2021_14329 crossref_primary_10_21597_jist_1285239 crossref_primary_10_1016_j_autcon_2018_12_016 crossref_primary_10_1061_JMENEA_MEENG_5485 crossref_primary_10_1007_s00521_021_06780_3 crossref_primary_10_1016_j_autcon_2019_102974 crossref_primary_10_1016_j_ssci_2023_106113 crossref_primary_10_1108_ECAM_04_2021_0303 crossref_primary_10_1016_j_aei_2020_101060 crossref_primary_10_1061_JMENEA_MEENG_5516 crossref_primary_10_1061__ASCE_CO_1943_7862_0002382 crossref_primary_10_1080_19439962_2019_1597795 crossref_primary_10_1109_ACCESS_2023_3304328 crossref_primary_10_1177_1071181320641034 crossref_primary_10_1016_j_aap_2021_106019 crossref_primary_10_1016_j_ssci_2020_104616 crossref_primary_10_3390_ijerph18115573 crossref_primary_10_1016_j_aap_2023_107224 crossref_primary_10_1016_j_autcon_2022_104304 crossref_primary_10_1016_j_aei_2022_101752 crossref_primary_10_3390_app13126983 crossref_primary_10_3389_fpubh_2022_984099 crossref_primary_10_1177_03611981211001385 crossref_primary_10_7717_peerj_cs_1985 crossref_primary_10_1061_JCEMD4_COENG_13523 crossref_primary_10_1016_j_ssci_2021_105528 crossref_primary_10_1080_10803548_2022_2118983 crossref_primary_10_36680_j_itcon_2022_045 crossref_primary_10_1177_03611981221106786 crossref_primary_10_1016_j_tust_2023_105157 crossref_primary_10_1177_20552076231185674 crossref_primary_10_1016_j_autcon_2022_104351 crossref_primary_10_3233_WOR_220533 crossref_primary_10_36680_j_itcon_2023_013 crossref_primary_10_1016_j_tust_2021_103852 crossref_primary_10_3390_su132413579 crossref_primary_10_1016_j_autcon_2024_105522 crossref_primary_10_3390_app10175754 crossref_primary_10_1016_j_ocecoaman_2023_106660 crossref_primary_10_1061_JCEMD4_COENG_12848 crossref_primary_10_1002_cpe_7277 crossref_primary_10_3389_fbuil_2021_690071 crossref_primary_10_1155_2023_4181159 crossref_primary_10_3390_safety5020033 crossref_primary_10_1002_cpe_7437 crossref_primary_10_3390_ijerph21070831 crossref_primary_10_1002_prs_12556 crossref_primary_10_1061_JCEMD4_COENG_14080 crossref_primary_10_1016_j_jobe_2024_109330 crossref_primary_10_1016_j_ssci_2022_106023 crossref_primary_10_1016_j_ssci_2024_106468 crossref_primary_10_1016_j_aei_2021_101355 crossref_primary_10_1108_ECAM_04_2022_0305 crossref_primary_10_1016_j_autcon_2021_103896 crossref_primary_10_1108_ECAM_09_2021_0797 crossref_primary_10_1007_s11831_023_09938_5 crossref_primary_10_3390_app14041352 crossref_primary_10_1016_j_psep_2021_05_036 crossref_primary_10_1061_JCEMD4_COENG_13023 crossref_primary_10_1016_j_ssci_2020_105130 crossref_primary_10_1061_JCEMD4_COENG_14114 crossref_primary_10_1061_JCEMD4_COENG_14515 crossref_primary_10_1016_j_wpi_2023_102259 crossref_primary_10_1061_JCEMD4_COENG_13549 crossref_primary_10_1016_j_procs_2023_10_515 |
Cites_doi | 10.1016/j.autcon.2012.11.037 10.1186/s12911-021-01695-4 10.1016/j.ssci.2014.10.006 10.1016/j.jsr.2012.10.012 10.1186/1472-6947-15-S1-S5 10.1016/j.aap.2013.09.012 10.1016/j.autcon.2016.01.001 10.1186/1472-6947-10-19 10.1016/j.aap.2009.09.020 10.1136/ip.2010.030593 10.1061/(ASCE)0733-9364(2004)130:4(542) 10.1136/injuryprev-2015-041813 10.1016/j.autcon.2015.11.001 10.1016/j.autcon.2014.02.014 10.1023/A:1009715923555 10.1016/j.autcon.2012.10.014 10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO;2-L 10.1016/j.aap.2015.03.018 10.1016/j.compind.2015.09.005 10.1002/cpe.3040 10.1145/944012.944013 10.1145/505282.505283 |
ContentType | Journal Article |
Copyright | 2017 Elsevier Ltd Copyright © 2017 Elsevier Ltd. All rights reserved. |
Copyright_xml | – notice: 2017 Elsevier Ltd – notice: Copyright © 2017 Elsevier Ltd. All rights reserved. |
DBID | CGR CUY CVF ECM EIF NPM AAYXX CITATION 7X8 |
DOI | 10.1016/j.aap.2017.08.026 |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed CrossRef MEDLINE - Academic |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) CrossRef MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic MEDLINE |
Database_xml | – sequence: 1 dbid: ECM name: MEDLINE url: https://search.ebscohost.com/login.aspx?direct=true&db=cmedm&site=ehost-live sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Social Welfare & Social Work Public Health |
EISSN | 1879-2057 |
EndPage | 130 |
ExternalDocumentID | 10_1016_j_aap_2017_08_026 28865927 S0001457517303068 |
Genre | Validation Studies Evaluation Studies Journal Article |
GroupedDBID | --- --K --M -~X ..I .~1 0R~ 1B1 1RT 1~. 23M 4.4 457 4G. 53G 5GY 5RE 5VS 7-5 71M 8P~ 9JM 9JN 9JO AABNK AACTN AAEDT AAEDW AAFJI AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO ABBQC ABDMP ABFNM ABIVO ABJNI ABLVK ABMAC ABMMH ABMZM ABNUV ABXDB ABYKQ ACDAQ ACGFS ACHQT ACNCT ACNNM ACRLP ADBBV ADEWK ADEZE ADMUD ADTZH AEBSH AECPX AEKER AFKWA AFTJW AFXIZ AGHFR AGUBO AGYEJ AHHHB AHJVU AHPOS AHRSL AI. AIEXJ AIKHN AITUG AJBFU AJOXV AJRQY AKURH AKYCK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ ANZVX AOMHK ASPBG AVARZ AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC BNPGV CS3 EBS EFJIC EFLBG EJD ENUVR EO8 EO9 EP2 EP3 F3I F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA HEH HMK HMO HMY HVGLF HZ~ IHE J1W JJJVA KOM LCYCR M29 M3W M3Y M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PRBVW Q38 R2- RIG ROL RPZ SAE SCC SDF SDG SDP SES SEW SPC SPCBC SSB SSG SSH SSO SSS SST SSZ T5K VH1 WUQ XPP ZCG ZGI ~G- AAXKI AFJKZ AKRWK CGR CUY CVF ECM EIF NPM AAYXX CITATION 7X8 |
ID | FETCH-LOGICAL-c353t-f923e0921234de23aa025c3fc11e3ff3d8f10c63a0878f6652d93d366477f0673 |
ISSN | 0001-4575 |
IngestDate | Fri Oct 25 00:11:48 EDT 2024 Thu Sep 26 17:43:06 EDT 2024 Sat Sep 28 08:50:34 EDT 2024 Fri Feb 23 02:33:12 EST 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | Text mining Construction safety Support vector machine Data mining Accident classification |
Language | English |
License | Copyright © 2017 Elsevier Ltd. All rights reserved. |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c353t-f923e0921234de23aa025c3fc11e3ff3d8f10c63a0878f6652d93d366477f0673 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
PMID | 28865927 |
PQID | 1935385688 |
PQPubID | 23479 |
PageCount | 9 |
ParticipantIDs | proquest_miscellaneous_1935385688 crossref_primary_10_1016_j_aap_2017_08_026 pubmed_primary_28865927 elsevier_sciencedirect_doi_10_1016_j_aap_2017_08_026 |
PublicationCentury | 2000 |
PublicationDate | November 2017 2017-Nov 2017-11-00 20171101 |
PublicationDateYYYYMMDD | 2017-11-01 |
PublicationDate_xml | – month: 11 year: 2017 text: November 2017 |
PublicationDecade | 2010 |
PublicationPlace | England |
PublicationPlace_xml | – name: England |
PublicationTitle | Accident analysis and prevention |
PublicationTitleAlternate | Accid Anal Prev |
PublicationYear | 2017 |
Publisher | Elsevier Ltd |
Publisher_xml | – name: Elsevier Ltd |
References | Burges (bib0030) 1998; 2 Leximancer Pty Ltd (bib0070) 2016 Chen, Vallmuur, Nayak (bib0035) 2015; 15 Workplace Safety and Health Institute (bib0170) 2016 Chi, Lin, El-Gohary, Hsieh (bib0045) 2016; 64 Blum, Mitchell (bib0020) 1998 scikit-learn Community (bib0185) 2016 Tanguy, Tulechki, Urieli, Hermann, Raynal (bib0125) 2015; 78 Yu, Hsu (bib0175) 2013; 31 Chua, Goh (bib0050) 2004; 130 Fan, Li (bib0055) 2013; 34 Zhou, Goh, Li (bib0180) 2015; 72 Reason (bib0110) 1997 Keikha, Razavian, Oroumchian, Razi (bib0065) 2008 Turney, Littman (bib0140) 2003; 21 McKenzie, Campbell, Scott, Discoll, Harrison, McClure (bib0080) 2010; 10 Goh (bib0060) 2016 Bishop (bib0015) 2006 Vallmuur, Marucci-Wellman, Taylor, Lehto, Corns, Smith (bib0145) 2016; 22 Sebastiani (bib0115) 2002; 34 Bertke, Meyers, Wurzelbacher, Bell, Lampl, Robins (bib0005) 2012; 43 Tixier, Hallowell, Rajagopalan, Bowman (bib0135) 2016; 62 Williams, Gong (bib0155) 2014; 43 Buckland, Gey (bib0025) 1994; 45 Chen, Vallmuur, Nayak (bib0040) 2015; 15 McKenzie, Scott, Campbell, McClure (bib0085) 2010; 42 Vallmuur (bib0150) 2015; 79 Taylor, Lacovara, Smith, Pandian, Lehto (bib0130) 2014; 62 Raschka (bib0105) 2015 Witten (bib0165) 2011 Marucci-Wellman, Lehto, Corns (bib0075) 2011; 17 Williams (bib0160) 2011 Peng, Liu, Zuo (bib0095) 2014; 26 Shibukawa (bib0120) 2013 Bird, Klein, Loper (bib0010) 2009 Occupational Safety and Health Administration (bib0090) 2016 Python Software Foundation (bib0100) 2016 Witten (10.1016/j.aap.2017.08.026_bib0165) 2011 scikit-learn Community (10.1016/j.aap.2017.08.026_bib0185) 2016 Buckland (10.1016/j.aap.2017.08.026_bib0025) 1994; 45 McKenzie (10.1016/j.aap.2017.08.026_bib0080) 2010; 10 Peng (10.1016/j.aap.2017.08.026_bib0095) 2014; 26 Vallmuur (10.1016/j.aap.2017.08.026_bib0150) 2015; 79 Marucci-Wellman (10.1016/j.aap.2017.08.026_bib0075) 2011; 17 McKenzie (10.1016/j.aap.2017.08.026_bib0085) 2010; 42 Chua (10.1016/j.aap.2017.08.026_bib0050) 2004; 130 Tixier (10.1016/j.aap.2017.08.026_bib0135) 2016; 62 Taylor (10.1016/j.aap.2017.08.026_bib0130) 2014; 62 Workplace Safety and Health Institute (10.1016/j.aap.2017.08.026_bib0170) 2016 Goh (10.1016/j.aap.2017.08.026_bib0060) 2016 Sebastiani (10.1016/j.aap.2017.08.026_bib0115) 2002; 34 Williams (10.1016/j.aap.2017.08.026_bib0160) 2011 Occupational Safety and Health Administration (10.1016/j.aap.2017.08.026_bib0090) 2016 Zhou (10.1016/j.aap.2017.08.026_bib0180) 2015; 72 Chen (10.1016/j.aap.2017.08.026_bib0040) 2015; 15 Leximancer Pty Ltd (10.1016/j.aap.2017.08.026_bib0070) 2016 Keikha (10.1016/j.aap.2017.08.026_bib0065) 2008 Turney (10.1016/j.aap.2017.08.026_bib0140) 2003; 21 Chi (10.1016/j.aap.2017.08.026_bib0045) 2016; 64 Raschka (10.1016/j.aap.2017.08.026_bib0105) 2015 Bishop (10.1016/j.aap.2017.08.026_bib0015) 2006 Williams (10.1016/j.aap.2017.08.026_bib0155) 2014; 43 Bird (10.1016/j.aap.2017.08.026_bib0010) 2009 Fan (10.1016/j.aap.2017.08.026_bib0055) 2013; 34 Shibukawa (10.1016/j.aap.2017.08.026_bib0120) 2013 Yu (10.1016/j.aap.2017.08.026_bib0175) 2013; 31 Tanguy (10.1016/j.aap.2017.08.026_bib0125) 2015; 78 Blum (10.1016/j.aap.2017.08.026_bib0020) 1998 Burges (10.1016/j.aap.2017.08.026_bib0030) 1998; 2 Python Software Foundation (10.1016/j.aap.2017.08.026_bib0100) 2016 Vallmuur (10.1016/j.aap.2017.08.026_bib0145) 2016; 22 Bertke (10.1016/j.aap.2017.08.026_bib0005) 2012; 43 Chen (10.1016/j.aap.2017.08.026_bib0035) 2015; 15 Reason (10.1016/j.aap.2017.08.026_bib0110) 1997 |
References_xml | – volume: 2 start-page: 121 year: 1998 end-page: 167 ident: bib0030 article-title: A tutorial on support vector machines for pattern recognition publication-title: Data Min. Knowl. Discovery contributor: fullname: Burges – year: 2011 ident: bib0160 article-title: Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!) contributor: fullname: Williams – volume: 79 start-page: 41 year: 2015 end-page: 49 ident: bib0150 article-title: Machine learning approaches to analysing textual injury surveillance data: a systematic review publication-title: Accid. Anal. Prev. contributor: fullname: Vallmuur – year: 2016 ident: bib0100 article-title: Python Language Reference, Version 2.7 contributor: fullname: Python Software Foundation – year: 2006 ident: bib0015 article-title: Pattern Recognition and Machine Learning contributor: fullname: Bishop – year: 2015 ident: bib0105 article-title: Python Machine Learning contributor: fullname: Raschka – volume: 45 start-page: 12 year: 1994 ident: bib0025 article-title: The relationship between recall and precision publication-title: J. Am. Soc. Inf. Sci. (1986–1998) contributor: fullname: Gey – volume: 34 start-page: 85 year: 2013 end-page: 91 ident: bib0055 article-title: Retrieving similar cases for alternative dispute resolution in construction accidents using text mining techniques publication-title: Autom. Constr. contributor: fullname: Li – volume: 130 start-page: 542 year: 2004 end-page: 551 ident: bib0050 article-title: Incident causation model for improving feedback of safety knowledge publication-title: J. Constr. Eng. Manage. – Am. Soc. Civ. Eng. contributor: fullname: Goh – volume: 31 start-page: 65 year: 2013 end-page: 74 ident: bib0175 article-title: Content-based text mining technique for retrieval of CAD documents publication-title: Autom. Constr. contributor: fullname: Hsu – year: 2016 ident: bib0060 article-title: Accident Narratives Dataset Obtained from Occupational Safety and Health Administration (OSHA) Fatality and Catastrophe Investigation Summaries contributor: fullname: Goh – volume: 34 start-page: 1 year: 2002 end-page: 47 ident: bib0115 article-title: Machine learning in automated text categorization publication-title: ACM Comput. Surv. contributor: fullname: Sebastiani – volume: 15 start-page: S5 year: 2015 ident: bib0040 article-title: Injury narrative text classification using factorization model publication-title: BMC Med. Inf. Decis. Making contributor: fullname: Nayak – volume: 15 start-page: 1 year: 2015 end-page: 12 ident: bib0035 article-title: Injury narrative text classification using factorization model publication-title: BMC Med. Inform. Decis. Mak. contributor: fullname: Nayak – year: 2016 ident: bib0070 article-title: Leximancer contributor: fullname: Leximancer Pty Ltd – year: 2013 ident: bib0120 article-title: Snowball Stemming Library Collection for Python contributor: fullname: Shibukawa – volume: 21 start-page: 315 year: 2003 end-page: 346 ident: bib0140 article-title: Measuring praise and criticism: inference of semantic orientation from association publication-title: ACM Trans. Inf. Syst. contributor: fullname: Littman – volume: 43 start-page: 23 year: 2014 end-page: 29 ident: bib0155 article-title: Predicting construction cost overruns using text mining: numerical data and ensemble classifiers publication-title: Autom. Constr. contributor: fullname: Gong – volume: 22 start-page: i34 year: 2016 end-page: i42 ident: bib0145 article-title: Harnessing information from injury narratives in the ‘big data’ era: understanding and applying machine learning for injury surveillance publication-title: Inj. Prev. contributor: fullname: Smith – year: 2009 ident: bib0010 article-title: Natural Language Processing with Python contributor: fullname: Loper – volume: 26 start-page: 728 year: 2014 end-page: 741 ident: bib0095 article-title: PU text classification enhanced by term frequency–inverse document frequency-improved weighting publication-title: Concurrency Comput. Pract Experience contributor: fullname: Zuo – volume: 72 start-page: 337 year: 2015 end-page: 350 ident: bib0180 article-title: Overview and analysis of safety management studies in the construction industry publication-title: Saf. Sci. contributor: fullname: Li – volume: 62 start-page: 119 year: 2014 end-page: 129 ident: bib0130 article-title: Near-miss narratives from the fire service: a Bayesian analysis publication-title: Accid. Anal. Prev. contributor: fullname: Lehto – volume: 62 start-page: 45 year: 2016 end-page: 56 ident: bib0135 article-title: Automated content analysis for construction safety: a natural language processing system to extract precursors and outcomes from unstructured injury reports publication-title: Autom. Constr. contributor: fullname: Bowman – year: 1997 ident: bib0110 article-title: Managing the Risks of Organizational Accidents contributor: fullname: Reason – year: 2011 ident: bib0165 article-title: Data Mining: Practical Machine Learning Tools and Techniques contributor: fullname: Witten – volume: 43 year: 2012 ident: bib0005 article-title: Development and evaluation of a naive bayesian model for coding causation of workers compensation claims publication-title: J. Safety Res. contributor: fullname: Robins – volume: 64 start-page: 78 year: 2016 end-page: 88 ident: bib0045 article-title: Evaluating the strength of text classification categories for supporting construction field inspection publication-title: Autom. Constr. contributor: fullname: Hsieh – volume: 17 start-page: 407 year: 2011 end-page: 414 ident: bib0075 article-title: A combined Fuzzy and Naïve Bayesian strategy can be used to assign event codes to injury narratives publication-title: Inj. Prev. contributor: fullname: Corns – year: 2016 ident: bib0090 article-title: Fatality and Catastrophe Investigation Summaries contributor: fullname: Occupational Safety and Health Administration – volume: 78 start-page: 80 year: 2015 end-page: 95 ident: bib0125 article-title: Natural language processing for aviation safety reports: from classification to interactive analysis publication-title: Comput. Ind. contributor: fullname: Raynal – start-page: 92 year: 1998 end-page: 100 ident: bib0020 article-title: Combining labeled and unlabeled data with co-training publication-title: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, ACM contributor: fullname: Mitchell – start-page: 219 year: 2008 end-page: 232 ident: bib0065 article-title: Document representation and quality of text: an analysis publication-title: Survey of Text Mining II: Clustering, Classification, and Retrieval contributor: fullname: Razi – volume: 42 start-page: 354 year: 2010 end-page: 363 ident: bib0085 article-title: The use of narrative text for injury surveillance research: a systematic review publication-title: Accid. Anal. Prev. contributor: fullname: McClure – volume: 10 start-page: 1 year: 2010 end-page: 10 ident: bib0080 article-title: Identifying work related injuries: comparison of methods for interrogating text fields publication-title: BMC Med. Inform. Decis. Mak. contributor: fullname: McClure – year: 2016 ident: bib0185 article-title: Scikit-learn – Machine Learning in Python contributor: fullname: scikit-learn Community – year: 2016 ident: bib0170 article-title: Workplace Safety and Health Report 2015 contributor: fullname: Workplace Safety and Health Institute – volume: 31 start-page: 65 year: 2013 ident: 10.1016/j.aap.2017.08.026_bib0175 article-title: Content-based text mining technique for retrieval of CAD documents publication-title: Autom. Constr. doi: 10.1016/j.autcon.2012.11.037 contributor: fullname: Yu – volume: 15 start-page: 1 issue: 1 year: 2015 ident: 10.1016/j.aap.2017.08.026_bib0035 article-title: Injury narrative text classification using factorization model publication-title: BMC Med. Inform. Decis. Mak. doi: 10.1186/s12911-021-01695-4 contributor: fullname: Chen – volume: 72 start-page: 337 issue: February year: 2015 ident: 10.1016/j.aap.2017.08.026_bib0180 article-title: Overview and analysis of safety management studies in the construction industry publication-title: Saf. Sci. doi: 10.1016/j.ssci.2014.10.006 contributor: fullname: Zhou – volume: 43 year: 2012 ident: 10.1016/j.aap.2017.08.026_bib0005 article-title: Development and evaluation of a naive bayesian model for coding causation of workers compensation claims publication-title: J. Safety Res. doi: 10.1016/j.jsr.2012.10.012 contributor: fullname: Bertke – year: 2006 ident: 10.1016/j.aap.2017.08.026_bib0015 contributor: fullname: Bishop – start-page: 92 year: 1998 ident: 10.1016/j.aap.2017.08.026_bib0020 article-title: Combining labeled and unlabeled data with co-training contributor: fullname: Blum – volume: 15 start-page: S5 issue: 1 year: 2015 ident: 10.1016/j.aap.2017.08.026_bib0040 article-title: Injury narrative text classification using factorization model publication-title: BMC Med. Inf. Decis. Making doi: 10.1186/1472-6947-15-S1-S5 contributor: fullname: Chen – volume: 62 start-page: 119 year: 2014 ident: 10.1016/j.aap.2017.08.026_bib0130 article-title: Near-miss narratives from the fire service: a Bayesian analysis publication-title: Accid. Anal. Prev. doi: 10.1016/j.aap.2013.09.012 contributor: fullname: Taylor – year: 2011 ident: 10.1016/j.aap.2017.08.026_bib0165 contributor: fullname: Witten – volume: 64 start-page: 78 year: 2016 ident: 10.1016/j.aap.2017.08.026_bib0045 article-title: Evaluating the strength of text classification categories for supporting construction field inspection publication-title: Autom. Constr. doi: 10.1016/j.autcon.2016.01.001 contributor: fullname: Chi – volume: 10 start-page: 1 issue: 1 year: 2010 ident: 10.1016/j.aap.2017.08.026_bib0080 article-title: Identifying work related injuries: comparison of methods for interrogating text fields publication-title: BMC Med. Inform. Decis. Mak. doi: 10.1186/1472-6947-10-19 contributor: fullname: McKenzie – volume: 42 start-page: 354 issue: 2 year: 2010 ident: 10.1016/j.aap.2017.08.026_bib0085 article-title: The use of narrative text for injury surveillance research: a systematic review publication-title: Accid. Anal. Prev. doi: 10.1016/j.aap.2009.09.020 contributor: fullname: McKenzie – start-page: 219 year: 2008 ident: 10.1016/j.aap.2017.08.026_bib0065 article-title: Document representation and quality of text: an analysis contributor: fullname: Keikha – volume: 17 start-page: 407 issue: 6 year: 2011 ident: 10.1016/j.aap.2017.08.026_bib0075 article-title: A combined Fuzzy and Naïve Bayesian strategy can be used to assign event codes to injury narratives publication-title: Inj. Prev. doi: 10.1136/ip.2010.030593 contributor: fullname: Marucci-Wellman – year: 2013 ident: 10.1016/j.aap.2017.08.026_bib0120 contributor: fullname: Shibukawa – volume: 130 start-page: 542 issue: 4 year: 2004 ident: 10.1016/j.aap.2017.08.026_bib0050 article-title: Incident causation model for improving feedback of safety knowledge publication-title: J. Constr. Eng. Manage. – Am. Soc. Civ. Eng. doi: 10.1061/(ASCE)0733-9364(2004)130:4(542) contributor: fullname: Chua – year: 1997 ident: 10.1016/j.aap.2017.08.026_bib0110 contributor: fullname: Reason – volume: 22 start-page: i34 issue: Suppl 1 year: 2016 ident: 10.1016/j.aap.2017.08.026_bib0145 article-title: Harnessing information from injury narratives in the ‘big data’ era: understanding and applying machine learning for injury surveillance publication-title: Inj. Prev. doi: 10.1136/injuryprev-2015-041813 contributor: fullname: Vallmuur – year: 2016 ident: 10.1016/j.aap.2017.08.026_bib0090 contributor: fullname: Occupational Safety and Health Administration – year: 2015 ident: 10.1016/j.aap.2017.08.026_bib0105 contributor: fullname: Raschka – volume: 62 start-page: 45 year: 2016 ident: 10.1016/j.aap.2017.08.026_bib0135 article-title: Automated content analysis for construction safety: a natural language processing system to extract precursors and outcomes from unstructured injury reports publication-title: Autom. Constr. doi: 10.1016/j.autcon.2015.11.001 contributor: fullname: Tixier – volume: 43 start-page: 23 year: 2014 ident: 10.1016/j.aap.2017.08.026_bib0155 article-title: Predicting construction cost overruns using text mining: numerical data and ensemble classifiers publication-title: Autom. Constr. doi: 10.1016/j.autcon.2014.02.014 contributor: fullname: Williams – year: 2016 ident: 10.1016/j.aap.2017.08.026_bib0100 contributor: fullname: Python Software Foundation – year: 2016 ident: 10.1016/j.aap.2017.08.026_bib0185 contributor: fullname: scikit-learn Community – volume: 2 start-page: 121 issue: 2 year: 1998 ident: 10.1016/j.aap.2017.08.026_bib0030 article-title: A tutorial on support vector machines for pattern recognition publication-title: Data Min. Knowl. Discovery doi: 10.1023/A:1009715923555 contributor: fullname: Burges – volume: 34 start-page: 85 year: 2013 ident: 10.1016/j.aap.2017.08.026_bib0055 article-title: Retrieving similar cases for alternative dispute resolution in construction accidents using text mining techniques publication-title: Autom. Constr. doi: 10.1016/j.autcon.2012.10.014 contributor: fullname: Fan – year: 2011 ident: 10.1016/j.aap.2017.08.026_bib0160 contributor: fullname: Williams – volume: 45 start-page: 12 issue: 1 year: 1994 ident: 10.1016/j.aap.2017.08.026_bib0025 article-title: The relationship between recall and precision publication-title: J. Am. Soc. Inf. Sci. (1986–1998) doi: 10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO;2-L contributor: fullname: Buckland – volume: 79 start-page: 41 year: 2015 ident: 10.1016/j.aap.2017.08.026_bib0150 article-title: Machine learning approaches to analysing textual injury surveillance data: a systematic review publication-title: Accid. Anal. Prev. doi: 10.1016/j.aap.2015.03.018 contributor: fullname: Vallmuur – year: 2016 ident: 10.1016/j.aap.2017.08.026_bib0060 contributor: fullname: Goh – year: 2009 ident: 10.1016/j.aap.2017.08.026_bib0010 contributor: fullname: Bird – year: 2016 ident: 10.1016/j.aap.2017.08.026_bib0170 contributor: fullname: Workplace Safety and Health Institute – volume: 78 start-page: 80 year: 2015 ident: 10.1016/j.aap.2017.08.026_bib0125 article-title: Natural language processing for aviation safety reports: from classification to interactive analysis publication-title: Comput. Ind. doi: 10.1016/j.compind.2015.09.005 contributor: fullname: Tanguy – year: 2016 ident: 10.1016/j.aap.2017.08.026_bib0070 contributor: fullname: Leximancer Pty Ltd – volume: 26 start-page: 728 issue: 3 year: 2014 ident: 10.1016/j.aap.2017.08.026_bib0095 article-title: PU text classification enhanced by term frequency–inverse document frequency-improved weighting publication-title: Concurrency Comput. Pract Experience doi: 10.1002/cpe.3040 contributor: fullname: Peng – volume: 21 start-page: 315 issue: 4 year: 2003 ident: 10.1016/j.aap.2017.08.026_bib0140 article-title: Measuring praise and criticism: inference of semantic orientation from association publication-title: ACM Trans. Inf. Syst. doi: 10.1145/944012.944013 contributor: fullname: Turney – volume: 34 start-page: 1 issue: 1 year: 2002 ident: 10.1016/j.aap.2017.08.026_bib0115 article-title: Machine learning in automated text categorization publication-title: ACM Comput. Surv. doi: 10.1145/505282.505283 contributor: fullname: Sebastiani |
SSID | ssj0007875 |
Score | 2.6123772 |
Snippet | •Evaluated six machine learning algorithms for classifying accident narratives in accordance to 11 accident types.•Found that support vector machine (SVM)... Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators.... |
SourceID | proquest crossref pubmed elsevier |
SourceType | Aggregation Database Index Database Publisher |
StartPage | 122 |
SubjectTerms | Accident classification Accidents Accidents, Occupational - classification Algorithms Bayes Theorem Construction Industry Construction safety Data mining Data Mining - methods Databases, Factual Decision Trees Humans Linear Models Machine Learning - standards Narration Reproducibility of Results Safety Support Vector Machine Text mining |
Title | Construction accident narrative classification: An evaluation of text mining techniques |
URI | https://dx.doi.org/10.1016/j.aap.2017.08.026 https://www.ncbi.nlm.nih.gov/pubmed/28865927 https://search.proquest.com/docview/1935385688 |
Volume | 108 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3da9UwFA9uexFk6PyaTokgPnjppUnaNPWtzCtT0JftsvlU0nwMprbj3u1h_70nSdN2k4kKvpSSe5uG_n6cnHNyPhB6zRpBFc-zRJdNk2TE6qSxhCTUUJ0Lnhbad2s4OCy-nIj3i2wxOvTHsf-KNIwB1i5z9i_QHiaFAbgHzOEKqMP1j3B3HThjTdiZVMo1Db2YtXLVl_hWTl128UFDWEfVTmp--5gBENizH75zxGyo8bqeqrFVnFfGmiZ9vYE-enKI6um81-arhJk-AxFP4w_LxlzBmuSVDDlp-_PlfOqAgE2NXHNADJkxYxhSkLRgm-ahK8rcBOEqihKwCgWpB-mbion8JCFJud-KSTiy-UXKB4fD2VxKV3GUFL4IK71RUdvv0YfeBnRnSyDJwDoSG2iLgkgCibhVfVycfBp2bRBcodtFv-54Au5jAW-86DYd5jYbxesqR_fRdm9k4Cqw4wG6Y9oddC94aHFIPNtBeyEzGx-b71auDH6D40C3-vYQHU-JhCOR8EAkfJ1I73DV4pFGuLPY0QgHGuGRRo_Q8sPiaP8g6btwJIrl7CKxYAKYtHQqTqYNZVKCmqyYVYQYZi3TwpJUcSZTUQjLeU51yTTjLsPZujZIj9Fm27XmKcJNaqSBWTktwTBtqOBGW6IsqKSsKAXfRW_jd63PQ7GVOkYhntUAQu1AqF3fVAp_zuKXr3ttMWiBNdDkd4-9iijVIEnd8ZhsTXe5rsGUgd0_50LsoicBvmEVVAgXf1A8-7eXPkd3_ZmAD4PaQ5uAnnmBNtb68mXPw58cfZ9F |
link.rule.ids | 315,782,786,27933,27934 |
linkProvider | Elsevier |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Construction+accident+narrative+classification%3A+An+evaluation+of+text+mining+techniques&rft.jtitle=Accident+analysis+and+prevention&rft.au=Goh%2C+Yang+Miang&rft.au=Ubeynarayana%2C+C.U.&rft.date=2017-11-01&rft.pub=Elsevier+Ltd&rft.issn=0001-4575&rft.eissn=1879-2057&rft.volume=108&rft.spage=122&rft.epage=130&rft_id=info:doi/10.1016%2Fj.aap.2017.08.026&rft.externalDocID=S0001457517303068 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0001-4575&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0001-4575&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0001-4575&client=summon |