KCB-FLAT: Enhancing Chinese Named Entity Recognition with Syntactic Information and Boundary Smoothing Techniques
Named entity recognition (NER) is a fundamental task in Natural Language Processing (NLP). During the training process, NER models suffer from over-confidence, and especially for the Chinese NER task, it involves word segmentation and introduces erroneous entity boundary segmentation, exacerbating o...
Saved in:
Published in: | Mathematics (Basel) Vol. 12; no. 17; p. 2714 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
Basel
MDPI AG
01-09-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | Named entity recognition (NER) is a fundamental task in Natural Language Processing (NLP). During the training process, NER models suffer from over-confidence, and especially for the Chinese NER task, it involves word segmentation and introduces erroneous entity boundary segmentation, exacerbating over-confidence and reducing the model’s overall performance. These issues limit further enhancement of NER models. To tackle these problems, we proposes a new model named KCB-FLAT, designed to enhance Chinese NER performance by integrating enriched semantic information with the word-Boundary Smoothing technique. Particularly, we first extract various types of syntactic data and utilize a network named Key-Value Memory Network, based on syntactic information to functionalize this, integrating it through an attention mechanism to generate syntactic feature embeddings for Chinese characters. Subsequently, we employed an encoder named Cross-Transformer to thoroughly combine syntactic and lexical information to address the entity boundary segmentation errors caused by lexical information. Finally, we introduce a Boundary Smoothing module, combined with a regularity-conscious function, to capture the internal regularity of per entity, reducing the model’s overconfidence in entity probabilities through smoothing. Experimental results demonstrate that the proposed model achieves exceptional performance on the MSRA, Resume, Weibo, and self-built ZJ datasets, as verified by the F1 score. |
---|---|
AbstractList | Named entity recognition (NER) is a fundamental task in Natural Language Processing (NLP). During the training process, NER models suffer from over-confidence, and especially for the Chinese NER task, it involves word segmentation and introduces erroneous entity boundary segmentation, exacerbating over-confidence and reducing the model’s overall performance. These issues limit further enhancement of NER models. To tackle these problems, we proposes a new model named KCB-FLAT, designed to enhance Chinese NER performance by integrating enriched semantic information with the word-Boundary Smoothing technique. Particularly, we first extract various types of syntactic data and utilize a network named Key-Value Memory Network, based on syntactic information to functionalize this, integrating it through an attention mechanism to generate syntactic feature embeddings for Chinese characters. Subsequently, we employed an encoder named Cross-Transformer to thoroughly combine syntactic and lexical information to address the entity boundary segmentation errors caused by lexical information. Finally, we introduce a Boundary Smoothing module, combined with a regularity-conscious function, to capture the internal regularity of per entity, reducing the model’s overconfidence in entity probabilities through smoothing. Experimental results demonstrate that the proposed model achieves exceptional performance on the MSRA, Resume, Weibo, and self-built ZJ datasets, as verified by the F1 score. |
Author | Wei, Shiwei Deng, Zhenrong Huang, Zheng Zhang, Jinglin |
Author_xml | – sequence: 1 givenname: Zhenrong surname: Deng fullname: Deng, Zhenrong – sequence: 2 givenname: Zheng surname: Huang fullname: Huang, Zheng – sequence: 3 givenname: Shiwei orcidid: 0000-0003-3610-4111 surname: Wei fullname: Wei, Shiwei – sequence: 4 givenname: Jinglin surname: Zhang fullname: Zhang, Jinglin |
BookMark | eNpNUU1rGzEQFSWFumlu_QGCXruNvna16i0xSWNqWqjds5jVh1cmlhKtTPC_jxyHkLnMMPN47w3vMzqLKTqEvlLyg3NFLndQRsqoZJKKD2jGGJONrIezd_MndDFNW1JLUd4LNUOPv-fXze3yav0T38QRoglxg-djiG5y-A_snK37EsoB_3MmbWIoIUX8FMqIV4dYwJRg8CL6lKv-8QTR4uu0jxbyAa92KZXxSLl2Zozhce-mL-ijh_vJXbz2c_T_9mY9v2uWf38t5lfLxrBWlEZaINI7JgeuhHLEGdoR77kYLLVC2qEdOuago9ATIACt75XiUg7ESkLagZ-jxYnXJtjqhxx21ZFOEPTLIuWNhlzd3zttjJGyJYwpLwVIPxjbU8F72ksFxPDK9e3E9ZDT8Yeit2mfY7WvOSWCdB2jqqK-n1Amp2nKzr-pUqKPGen3GfFnX8mGFg |
Cites_doi | 10.18653/v1/E17-2113 10.3390/app10175792 10.1162/tacl_a_00104 10.18653/v1/2020.acl-main.528 10.3390/sym12121986 10.18653/v1/D19-1519 10.18653/v1/2020.acl-main.611 10.18653/v1/2023.findings-acl.89 10.3390/s23239402 10.18653/v1/2020.acl-main.577 10.18653/v1/2020.acl-main.519 10.18653/v1/2022.acl-long.428 10.3390/app11188319 10.3390/w15061197 10.24963/ijcai.2021/542 10.3390/app12115373 10.18653/v1/2022.findings-acl.146 10.18653/v1/2022.findings-naacl.143 10.3390/s20061652 10.1609/aaai.v36i10.21344 10.18653/v1/2022.findings-acl.155 10.18653/v1/2020.emnlp-main.27 10.3390/healthcare11091268 10.3390/sym13050786 10.18653/v1/2023.acl-long.215 10.18653/v1/P16-1101 10.21203/rs.3.rs-1805659/v1 10.3390/app12157708 10.18653/v1/2020.findings-emnlp.378 10.18653/v1/D18-1309 10.3390/robotics13070106 10.1111/exsy.13553 10.18653/v1/2021.acl-long.216 10.18653/v1/2021.acl-long.121 10.18653/v1/2020.acl-main.703 10.3390/s23041771 |
ContentType | Journal Article |
Copyright | 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION 3V. 7SC 7TB 7XB 8AL 8FD 8FE 8FG 8FK ABJCF ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO FR3 GNUQQ HCIFZ JQ2 K7- KR7 L6V L7M L~C L~D M0N M7S P62 PIMPY PQEST PQQKQ PQUKI PRINS PTHSS Q9U DOA |
DOI | 10.3390/math12172714 |
DatabaseName | CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts Mechanical & Transportation Engineering Abstracts ProQuest Central (purchase pre-March 2016) Computing Database (Alumni Edition) Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) Materials Science & Engineering Database (Proquest) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Database (1962 - current) ProQuest Central Essentials AUTh Library subscriptions: ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Engineering Research Database ProQuest Central Student SciTech Premium Collection (Proquest) (PQ_SDU_P3) ProQuest Computer Science Collection Computer Science Database Civil Engineering Abstracts ProQuest Engineering Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Computing Database ProQuest Engineering Database ProQuest Advanced Technologies & Aerospace Collection ProQuest - Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection ProQuest Central Basic Directory of Open Access Journals |
DatabaseTitle | CrossRef Publicly Available Content Database Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest Central Korea Advanced Technologies Database with Aerospace Engineering Collection Advanced Technologies & Aerospace Collection Civil Engineering Abstracts ProQuest Computing Engineering Database ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Computer and Information Systems Abstracts Professional ProQuest One Academic UKI Edition Materials Science & Engineering Collection Engineering Research Database ProQuest One Academic ProQuest Central (Alumni) |
DatabaseTitleList | CrossRef Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: DOA name: Directory of Open Access Journals url: http://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Mathematics |
EISSN | 2227-7390 |
ExternalDocumentID | oai_doaj_org_article_ccc7750229f74a7fbcd814381879a0c3 10_3390_math12172714 |
GeographicLocations | China |
GeographicLocations_xml | – name: China |
GroupedDBID | -~X 3V. 5VS 85S 8FE 8FG AADQD AAFWJ AAYXX ABDBF ABJCF ABJNI ABPPZ ABUWG ACIPV ACIWK ADBBV AFKRA AFZYC ALMA_UNASSIGNED_HOLDINGS ARAPS AZQEC BCNDV BENPR BGLVJ BPHCQ CCPQU CITATION DWQXO GNUQQ GROUPED_DOAJ HCIFZ IAO ITC K6V K7- KQ8 L6V M0N M7S MODMG M~E OK1 PIMPY PQQKQ PROAC PTHSS RNS 7SC 7TB 7XB 8AL 8FD 8FK FR3 JQ2 KR7 L7M L~C L~D P62 PQEST PQUKI PRINS Q9U |
ID | FETCH-LOGICAL-c254t-7da07fe27b3949e0ec160ff34bd1d47db5b62ea61a80a0aa5f899377b0d7005b3 |
IEDL.DBID | DOA |
ISSN | 2227-7390 |
IngestDate | Tue Oct 22 15:10:58 EDT 2024 Thu Oct 10 21:50:57 EDT 2024 Fri Nov 22 02:26:24 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 17 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c254t-7da07fe27b3949e0ec160ff34bd1d47db5b62ea61a80a0aa5f899377b0d7005b3 |
ORCID | 0000-0003-3610-4111 |
OpenAccessLink | https://doaj.org/article/ccc7750229f74a7fbcd814381879a0c3 |
PQID | 3104066219 |
PQPubID | 2032364 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_ccc7750229f74a7fbcd814381879a0c3 proquest_journals_3104066219 crossref_primary_10_3390_math12172714 |
PublicationCentury | 2000 |
PublicationDate | 2024-09-01 |
PublicationDateYYYYMMDD | 2024-09-01 |
PublicationDate_xml | – month: 09 year: 2024 text: 2024-09-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Basel |
PublicationPlace_xml | – name: Basel |
PublicationTitle | Mathematics (Basel) |
PublicationYear | 2024 |
Publisher | MDPI AG |
Publisher_xml | – name: MDPI AG |
References | ref_50 ref_14 ref_12 ref_11 Daneshfar (ref_31) 2024; 41 ref_19 Chiu (ref_10) 2016; 4 ref_18 Rafael (ref_40) 2019; 32 ref_16 ref_15 Wang (ref_17) 2002; 16 Ashish (ref_39) 2017; 30 ref_25 ref_24 ref_23 ref_22 ref_21 ref_20 ref_29 ref_28 ref_27 ref_26 ref_36 ref_35 ref_34 ref_33 ref_32 Collobert (ref_42) 2011; 12 ref_30 ref_38 ref_37 ref_47 ref_46 ref_45 ref_44 ref_43 ref_41 ref_1 ref_3 ref_2 ref_49 ref_48 ref_9 ref_8 Muresan (ref_13) 2022; Volume 1 ref_5 ref_4 ref_7 ref_6 |
References_xml | – ident: ref_49 – ident: ref_8 doi: 10.18653/v1/E17-2113 – ident: ref_36 doi: 10.3390/app10175792 – volume: 4 start-page: 357 year: 2016 ident: ref_10 article-title: Named entity recognition with bidirectional LSTM-CNNs publication-title: Trans. Assoc. Comput. Linguist. doi: 10.1162/tacl_a_00104 contributor: fullname: Chiu – ident: ref_48 doi: 10.18653/v1/2020.acl-main.528 – ident: ref_26 – ident: ref_9 doi: 10.3390/sym12121986 – volume: 32 start-page: 4696 year: 2019 ident: ref_40 article-title: When Does Label Smoothing Help? publication-title: Adv. Neural Inf. Process. Syst. contributor: fullname: Rafael – ident: ref_12 doi: 10.18653/v1/D19-1519 – ident: ref_15 doi: 10.18653/v1/2020.acl-main.611 – ident: ref_45 doi: 10.18653/v1/2023.findings-acl.89 – ident: ref_38 doi: 10.3390/s23239402 – ident: ref_25 doi: 10.18653/v1/2020.acl-main.577 – ident: ref_27 – ident: ref_24 doi: 10.18653/v1/2020.acl-main.519 – ident: ref_33 doi: 10.18653/v1/2022.acl-long.428 – volume: Volume 1 start-page: 7096 year: 2022 ident: ref_13 article-title: Boundary Smoothing for Named Entity Recognition publication-title: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022 contributor: fullname: Muresan – ident: ref_41 – ident: ref_2 doi: 10.3390/app11188319 – ident: ref_6 doi: 10.3390/w15061197 – ident: ref_22 doi: 10.24963/ijcai.2021/542 – ident: ref_1 doi: 10.3390/app12115373 – ident: ref_30 doi: 10.18653/v1/2022.findings-acl.146 – volume: 16 start-page: 1 year: 2002 ident: ref_17 article-title: Company name identification in Chinese financial domain publication-title: J. Chin. Inf. Pro. contributor: fullname: Wang – ident: ref_43 doi: 10.18653/v1/2022.findings-naacl.143 – ident: ref_7 – ident: ref_37 doi: 10.3390/s20061652 – volume: 30 start-page: 5998 year: 2017 ident: ref_39 article-title: Attention is all you need publication-title: Adv. Neural Inf. Process. Syst. contributor: fullname: Ashish – ident: ref_21 doi: 10.1609/aaai.v36i10.21344 – ident: ref_34 doi: 10.18653/v1/2022.findings-acl.155 – ident: ref_11 – ident: ref_28 doi: 10.18653/v1/2020.emnlp-main.27 – ident: ref_4 doi: 10.3390/healthcare11091268 – ident: ref_3 doi: 10.3390/sym13050786 – ident: ref_44 – ident: ref_46 doi: 10.18653/v1/2023.acl-long.215 – ident: ref_23 doi: 10.18653/v1/P16-1101 – ident: ref_32 doi: 10.21203/rs.3.rs-1805659/v1 – ident: ref_5 doi: 10.3390/app12157708 – ident: ref_50 – volume: 12 start-page: 2493 year: 2011 ident: ref_42 article-title: Natural language processing (almost) from scratch publication-title: JML contributor: fullname: Collobert – ident: ref_35 doi: 10.18653/v1/2020.findings-emnlp.378 – ident: ref_18 doi: 10.18653/v1/D18-1309 – ident: ref_16 doi: 10.3390/robotics13070106 – volume: 41 start-page: e13553 year: 2024 ident: ref_31 article-title: Enhanced text classification through an improved discrete laying chicken algorithm publication-title: Expert Syst. doi: 10.1111/exsy.13553 contributor: fullname: Daneshfar – ident: ref_19 – ident: ref_20 doi: 10.18653/v1/2021.acl-long.216 – ident: ref_47 doi: 10.18653/v1/2021.acl-long.121 – ident: ref_29 doi: 10.18653/v1/2020.acl-main.703 – ident: ref_14 doi: 10.3390/s23041771 |
SSID | ssj0000913849 |
Score | 2.3185067 |
Snippet | Named entity recognition (NER) is a fundamental task in Natural Language Processing (NLP). During the training process, NER models suffer from over-confidence,... |
SourceID | doaj proquest crossref |
SourceType | Open Website Aggregation Database |
StartPage | 2714 |
SubjectTerms | Chinese NER Data smoothing Datasets Internet named entity recognition Natural language processing Probability Recognition Regularity Segmentation Semantics Smoothing syntactic information word-boundary smoothing Words (language) |
Title | KCB-FLAT: Enhancing Chinese Named Entity Recognition with Syntactic Information and Boundary Smoothing Techniques |
URI | https://www.proquest.com/docview/3104066219 https://doaj.org/article/ccc7750229f74a7fbcd814381879a0c3 |
Volume | 12 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELWgEwyIT1EoyAOMUW0njRO2trSqBHSgRWKL7NimCyk07dB_z52TVpUYWBjjREl0Z_veWXfvEXIHIBhwQWph8roOJCg8CbRlUSCMNlYrEVuFvcOjiRy_J48DpMnZSn1hTVhFD1wZrp3nuYSoJkTqZKSk07lJULIbVbIVyyueTxbvJFN-D055mERpVekeQl7fBvw346jG5Bt2dmKQp-r_tRP78DI8Jkc1LqTd6n9OyJ4tTsnhy5ZUtTwj30_9XjB87k4f6KCYIU9G8UFR_9qWlo4VhDUYB-ev6eumKmheUDxopZN1sfTdULRuP_K3VGFoz8sqLdZ08jkHp-Erpxta1_KcvA0H0_4oqBUTghwSvWUgjWLSWSF1mEapZTbnMXMujLThJpJGd3QsrIq5SphiSnVcgvhEamYkLEcdXpBGMS_sJaFcwaXD7SBHApkYebxSIbRLmNRcqSa539gw-6qIMTJIKNDW2a6tm6SHBt4-g3TWfgCcnNVOzv5ycpO0Nu7J6jVWZgBMI-Sv5-nVf3zjmhwIACxV_ViLNJaLlb0h-6VZ3fq59QNAgdOS |
link.rule.ids | 315,782,786,866,2106,27933,27934 |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=KCB-FLAT%3A+Enhancing+Chinese+Named+Entity+Recognition+with+Syntactic+Information+and+Boundary+Smoothing+Techniques&rft.jtitle=Mathematics+%28Basel%29&rft.au=Deng%2C+Zhenrong&rft.au=Huang%2C+Zheng&rft.au=Wei%2C+Shiwei&rft.au=Zhang%2C+Jinglin&rft.date=2024-09-01&rft.pub=MDPI+AG&rft.eissn=2227-7390&rft.volume=12&rft.issue=17&rft.spage=2714&rft_id=info:doi/10.3390%2Fmath12172714&rft.externalDBID=HAS_PDF_LINK |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2227-7390&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2227-7390&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2227-7390&client=summon |