KCB-FLAT: Enhancing Chinese Named Entity Recognition with Syntactic Information and Boundary Smoothing Techniques

Named entity recognition (NER) is a fundamental task in Natural Language Processing (NLP). During the training process, NER models suffer from over-confidence, and especially for the Chinese NER task, it involves word segmentation and introduces erroneous entity boundary segmentation, exacerbating o...

Full description

Saved in:
Bibliographic Details
Published in:Mathematics (Basel) Vol. 12; no. 17; p. 2714
Main Authors: Deng, Zhenrong, Huang, Zheng, Wei, Shiwei, Zhang, Jinglin
Format: Journal Article
Language:English
Published: Basel MDPI AG 01-09-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Named entity recognition (NER) is a fundamental task in Natural Language Processing (NLP). During the training process, NER models suffer from over-confidence, and especially for the Chinese NER task, it involves word segmentation and introduces erroneous entity boundary segmentation, exacerbating over-confidence and reducing the model’s overall performance. These issues limit further enhancement of NER models. To tackle these problems, we proposes a new model named KCB-FLAT, designed to enhance Chinese NER performance by integrating enriched semantic information with the word-Boundary Smoothing technique. Particularly, we first extract various types of syntactic data and utilize a network named Key-Value Memory Network, based on syntactic information to functionalize this, integrating it through an attention mechanism to generate syntactic feature embeddings for Chinese characters. Subsequently, we employed an encoder named Cross-Transformer to thoroughly combine syntactic and lexical information to address the entity boundary segmentation errors caused by lexical information. Finally, we introduce a Boundary Smoothing module, combined with a regularity-conscious function, to capture the internal regularity of per entity, reducing the model’s overconfidence in entity probabilities through smoothing. Experimental results demonstrate that the proposed model achieves exceptional performance on the MSRA, Resume, Weibo, and self-built ZJ datasets, as verified by the F1 score.
AbstractList Named entity recognition (NER) is a fundamental task in Natural Language Processing (NLP). During the training process, NER models suffer from over-confidence, and especially for the Chinese NER task, it involves word segmentation and introduces erroneous entity boundary segmentation, exacerbating over-confidence and reducing the model’s overall performance. These issues limit further enhancement of NER models. To tackle these problems, we proposes a new model named KCB-FLAT, designed to enhance Chinese NER performance by integrating enriched semantic information with the word-Boundary Smoothing technique. Particularly, we first extract various types of syntactic data and utilize a network named Key-Value Memory Network, based on syntactic information to functionalize this, integrating it through an attention mechanism to generate syntactic feature embeddings for Chinese characters. Subsequently, we employed an encoder named Cross-Transformer to thoroughly combine syntactic and lexical information to address the entity boundary segmentation errors caused by lexical information. Finally, we introduce a Boundary Smoothing module, combined with a regularity-conscious function, to capture the internal regularity of per entity, reducing the model’s overconfidence in entity probabilities through smoothing. Experimental results demonstrate that the proposed model achieves exceptional performance on the MSRA, Resume, Weibo, and self-built ZJ datasets, as verified by the F1 score.
Author Wei, Shiwei
Deng, Zhenrong
Huang, Zheng
Zhang, Jinglin
Author_xml – sequence: 1
  givenname: Zhenrong
  surname: Deng
  fullname: Deng, Zhenrong
– sequence: 2
  givenname: Zheng
  surname: Huang
  fullname: Huang, Zheng
– sequence: 3
  givenname: Shiwei
  orcidid: 0000-0003-3610-4111
  surname: Wei
  fullname: Wei, Shiwei
– sequence: 4
  givenname: Jinglin
  surname: Zhang
  fullname: Zhang, Jinglin
BookMark eNpNUU1rGzEQFSWFumlu_QGCXruNvna16i0xSWNqWqjds5jVh1cmlhKtTPC_jxyHkLnMMPN47w3vMzqLKTqEvlLyg3NFLndQRsqoZJKKD2jGGJONrIezd_MndDFNW1JLUd4LNUOPv-fXze3yav0T38QRoglxg-djiG5y-A_snK37EsoB_3MmbWIoIUX8FMqIV4dYwJRg8CL6lKv-8QTR4uu0jxbyAa92KZXxSLl2Zozhce-mL-ijh_vJXbz2c_T_9mY9v2uWf38t5lfLxrBWlEZaINI7JgeuhHLEGdoR77kYLLVC2qEdOuago9ATIACt75XiUg7ESkLagZ-jxYnXJtjqhxx21ZFOEPTLIuWNhlzd3zttjJGyJYwpLwVIPxjbU8F72ksFxPDK9e3E9ZDT8Yeit2mfY7WvOSWCdB2jqqK-n1Amp2nKzr-pUqKPGen3GfFnX8mGFg
Cites_doi 10.18653/v1/E17-2113
10.3390/app10175792
10.1162/tacl_a_00104
10.18653/v1/2020.acl-main.528
10.3390/sym12121986
10.18653/v1/D19-1519
10.18653/v1/2020.acl-main.611
10.18653/v1/2023.findings-acl.89
10.3390/s23239402
10.18653/v1/2020.acl-main.577
10.18653/v1/2020.acl-main.519
10.18653/v1/2022.acl-long.428
10.3390/app11188319
10.3390/w15061197
10.24963/ijcai.2021/542
10.3390/app12115373
10.18653/v1/2022.findings-acl.146
10.18653/v1/2022.findings-naacl.143
10.3390/s20061652
10.1609/aaai.v36i10.21344
10.18653/v1/2022.findings-acl.155
10.18653/v1/2020.emnlp-main.27
10.3390/healthcare11091268
10.3390/sym13050786
10.18653/v1/2023.acl-long.215
10.18653/v1/P16-1101
10.21203/rs.3.rs-1805659/v1
10.3390/app12157708
10.18653/v1/2020.findings-emnlp.378
10.18653/v1/D18-1309
10.3390/robotics13070106
10.1111/exsy.13553
10.18653/v1/2021.acl-long.216
10.18653/v1/2021.acl-long.121
10.18653/v1/2020.acl-main.703
10.3390/s23041771
ContentType Journal Article
Copyright 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
3V.
7SC
7TB
7XB
8AL
8FD
8FE
8FG
8FK
ABJCF
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FR3
GNUQQ
HCIFZ
JQ2
K7-
KR7
L6V
L7M
L~C
L~D
M0N
M7S
P62
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
PTHSS
Q9U
DOA
DOI 10.3390/math12172714
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
Mechanical & Transportation Engineering Abstracts
ProQuest Central (purchase pre-March 2016)
Computing Database (Alumni Edition)
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
Materials Science & Engineering Database (Proquest)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Database‎ (1962 - current)
ProQuest Central Essentials
AUTh Library subscriptions: ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
Engineering Research Database
ProQuest Central Student
SciTech Premium Collection (Proquest) (PQ_SDU_P3)
ProQuest Computer Science Collection
Computer Science Database
Civil Engineering Abstracts
ProQuest Engineering Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Computing Database
ProQuest Engineering Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest - Publicly Available Content Database
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
ProQuest Central Basic
Directory of Open Access Journals
DatabaseTitle CrossRef
Publicly Available Content Database
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest Engineering Collection
ProQuest Central Korea
Advanced Technologies Database with Aerospace
Engineering Collection
Advanced Technologies & Aerospace Collection
Civil Engineering Abstracts
ProQuest Computing
Engineering Database
ProQuest Central Basic
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
Engineering Research Database
ProQuest One Academic
ProQuest Central (Alumni)
DatabaseTitleList CrossRef

Publicly Available Content Database
Database_xml – sequence: 1
  dbid: DOA
  name: Directory of Open Access Journals
  url: http://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
EISSN 2227-7390
ExternalDocumentID oai_doaj_org_article_ccc7750229f74a7fbcd814381879a0c3
10_3390_math12172714
GeographicLocations China
GeographicLocations_xml – name: China
GroupedDBID -~X
3V.
5VS
85S
8FE
8FG
AADQD
AAFWJ
AAYXX
ABDBF
ABJCF
ABJNI
ABPPZ
ABUWG
ACIPV
ACIWK
ADBBV
AFKRA
AFZYC
ALMA_UNASSIGNED_HOLDINGS
ARAPS
AZQEC
BCNDV
BENPR
BGLVJ
BPHCQ
CCPQU
CITATION
DWQXO
GNUQQ
GROUPED_DOAJ
HCIFZ
IAO
ITC
K6V
K7-
KQ8
L6V
M0N
M7S
MODMG
M~E
OK1
PIMPY
PQQKQ
PROAC
PTHSS
RNS
7SC
7TB
7XB
8AL
8FD
8FK
FR3
JQ2
KR7
L7M
L~C
L~D
P62
PQEST
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c254t-7da07fe27b3949e0ec160ff34bd1d47db5b62ea61a80a0aa5f899377b0d7005b3
IEDL.DBID DOA
ISSN 2227-7390
IngestDate Tue Oct 22 15:10:58 EDT 2024
Thu Oct 10 21:50:57 EDT 2024
Fri Nov 22 02:26:24 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 17
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c254t-7da07fe27b3949e0ec160ff34bd1d47db5b62ea61a80a0aa5f899377b0d7005b3
ORCID 0000-0003-3610-4111
OpenAccessLink https://doaj.org/article/ccc7750229f74a7fbcd814381879a0c3
PQID 3104066219
PQPubID 2032364
ParticipantIDs doaj_primary_oai_doaj_org_article_ccc7750229f74a7fbcd814381879a0c3
proquest_journals_3104066219
crossref_primary_10_3390_math12172714
PublicationCentury 2000
PublicationDate 2024-09-01
PublicationDateYYYYMMDD 2024-09-01
PublicationDate_xml – month: 09
  year: 2024
  text: 2024-09-01
  day: 01
PublicationDecade 2020
PublicationPlace Basel
PublicationPlace_xml – name: Basel
PublicationTitle Mathematics (Basel)
PublicationYear 2024
Publisher MDPI AG
Publisher_xml – name: MDPI AG
References ref_50
ref_14
ref_12
ref_11
Daneshfar (ref_31) 2024; 41
ref_19
Chiu (ref_10) 2016; 4
ref_18
Rafael (ref_40) 2019; 32
ref_16
ref_15
Wang (ref_17) 2002; 16
Ashish (ref_39) 2017; 30
ref_25
ref_24
ref_23
ref_22
ref_21
ref_20
ref_29
ref_28
ref_27
ref_26
ref_36
ref_35
ref_34
ref_33
ref_32
Collobert (ref_42) 2011; 12
ref_30
ref_38
ref_37
ref_47
ref_46
ref_45
ref_44
ref_43
ref_41
ref_1
ref_3
ref_2
ref_49
ref_48
ref_9
ref_8
Muresan (ref_13) 2022; Volume 1
ref_5
ref_4
ref_7
ref_6
References_xml – ident: ref_49
– ident: ref_8
  doi: 10.18653/v1/E17-2113
– ident: ref_36
  doi: 10.3390/app10175792
– volume: 4
  start-page: 357
  year: 2016
  ident: ref_10
  article-title: Named entity recognition with bidirectional LSTM-CNNs
  publication-title: Trans. Assoc. Comput. Linguist.
  doi: 10.1162/tacl_a_00104
  contributor:
    fullname: Chiu
– ident: ref_48
  doi: 10.18653/v1/2020.acl-main.528
– ident: ref_26
– ident: ref_9
  doi: 10.3390/sym12121986
– volume: 32
  start-page: 4696
  year: 2019
  ident: ref_40
  article-title: When Does Label Smoothing Help?
  publication-title: Adv. Neural Inf. Process. Syst.
  contributor:
    fullname: Rafael
– ident: ref_12
  doi: 10.18653/v1/D19-1519
– ident: ref_15
  doi: 10.18653/v1/2020.acl-main.611
– ident: ref_45
  doi: 10.18653/v1/2023.findings-acl.89
– ident: ref_38
  doi: 10.3390/s23239402
– ident: ref_25
  doi: 10.18653/v1/2020.acl-main.577
– ident: ref_27
– ident: ref_24
  doi: 10.18653/v1/2020.acl-main.519
– ident: ref_33
  doi: 10.18653/v1/2022.acl-long.428
– volume: Volume 1
  start-page: 7096
  year: 2022
  ident: ref_13
  article-title: Boundary Smoothing for Named Entity Recognition
  publication-title: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022
  contributor:
    fullname: Muresan
– ident: ref_41
– ident: ref_2
  doi: 10.3390/app11188319
– ident: ref_6
  doi: 10.3390/w15061197
– ident: ref_22
  doi: 10.24963/ijcai.2021/542
– ident: ref_1
  doi: 10.3390/app12115373
– ident: ref_30
  doi: 10.18653/v1/2022.findings-acl.146
– volume: 16
  start-page: 1
  year: 2002
  ident: ref_17
  article-title: Company name identification in Chinese financial domain
  publication-title: J. Chin. Inf. Pro.
  contributor:
    fullname: Wang
– ident: ref_43
  doi: 10.18653/v1/2022.findings-naacl.143
– ident: ref_7
– ident: ref_37
  doi: 10.3390/s20061652
– volume: 30
  start-page: 5998
  year: 2017
  ident: ref_39
  article-title: Attention is all you need
  publication-title: Adv. Neural Inf. Process. Syst.
  contributor:
    fullname: Ashish
– ident: ref_21
  doi: 10.1609/aaai.v36i10.21344
– ident: ref_34
  doi: 10.18653/v1/2022.findings-acl.155
– ident: ref_11
– ident: ref_28
  doi: 10.18653/v1/2020.emnlp-main.27
– ident: ref_4
  doi: 10.3390/healthcare11091268
– ident: ref_3
  doi: 10.3390/sym13050786
– ident: ref_44
– ident: ref_46
  doi: 10.18653/v1/2023.acl-long.215
– ident: ref_23
  doi: 10.18653/v1/P16-1101
– ident: ref_32
  doi: 10.21203/rs.3.rs-1805659/v1
– ident: ref_5
  doi: 10.3390/app12157708
– ident: ref_50
– volume: 12
  start-page: 2493
  year: 2011
  ident: ref_42
  article-title: Natural language processing (almost) from scratch
  publication-title: JML
  contributor:
    fullname: Collobert
– ident: ref_35
  doi: 10.18653/v1/2020.findings-emnlp.378
– ident: ref_18
  doi: 10.18653/v1/D18-1309
– ident: ref_16
  doi: 10.3390/robotics13070106
– volume: 41
  start-page: e13553
  year: 2024
  ident: ref_31
  article-title: Enhanced text classification through an improved discrete laying chicken algorithm
  publication-title: Expert Syst.
  doi: 10.1111/exsy.13553
  contributor:
    fullname: Daneshfar
– ident: ref_19
– ident: ref_20
  doi: 10.18653/v1/2021.acl-long.216
– ident: ref_47
  doi: 10.18653/v1/2021.acl-long.121
– ident: ref_29
  doi: 10.18653/v1/2020.acl-main.703
– ident: ref_14
  doi: 10.3390/s23041771
SSID ssj0000913849
Score 2.3185067
Snippet Named entity recognition (NER) is a fundamental task in Natural Language Processing (NLP). During the training process, NER models suffer from over-confidence,...
SourceID doaj
proquest
crossref
SourceType Open Website
Aggregation Database
StartPage 2714
SubjectTerms Chinese NER
Data smoothing
Datasets
Internet
named entity recognition
Natural language processing
Probability
Recognition
Regularity
Segmentation
Semantics
Smoothing
syntactic information
word-boundary smoothing
Words (language)
Title KCB-FLAT: Enhancing Chinese Named Entity Recognition with Syntactic Information and Boundary Smoothing Techniques
URI https://www.proquest.com/docview/3104066219
https://doaj.org/article/ccc7750229f74a7fbcd814381879a0c3
Volume 12
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELWgEwyIT1EoyAOMUW0njRO2trSqBHSgRWKL7NimCyk07dB_z52TVpUYWBjjREl0Z_veWXfvEXIHIBhwQWph8roOJCg8CbRlUSCMNlYrEVuFvcOjiRy_J48DpMnZSn1hTVhFD1wZrp3nuYSoJkTqZKSk07lJULIbVbIVyyueTxbvJFN-D055mERpVekeQl7fBvw346jG5Bt2dmKQp-r_tRP78DI8Jkc1LqTd6n9OyJ4tTsnhy5ZUtTwj30_9XjB87k4f6KCYIU9G8UFR_9qWlo4VhDUYB-ev6eumKmheUDxopZN1sfTdULRuP_K3VGFoz8sqLdZ08jkHp-Erpxta1_KcvA0H0_4oqBUTghwSvWUgjWLSWSF1mEapZTbnMXMujLThJpJGd3QsrIq5SphiSnVcgvhEamYkLEcdXpBGMS_sJaFcwaXD7SBHApkYebxSIbRLmNRcqSa539gw-6qIMTJIKNDW2a6tm6SHBt4-g3TWfgCcnNVOzv5ycpO0Nu7J6jVWZgBMI-Sv5-nVf3zjmhwIACxV_ViLNJaLlb0h-6VZ3fq59QNAgdOS
link.rule.ids 315,782,786,866,2106,27933,27934
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=KCB-FLAT%3A+Enhancing+Chinese+Named+Entity+Recognition+with+Syntactic+Information+and+Boundary+Smoothing+Techniques&rft.jtitle=Mathematics+%28Basel%29&rft.au=Deng%2C+Zhenrong&rft.au=Huang%2C+Zheng&rft.au=Wei%2C+Shiwei&rft.au=Zhang%2C+Jinglin&rft.date=2024-09-01&rft.pub=MDPI+AG&rft.eissn=2227-7390&rft.volume=12&rft.issue=17&rft.spage=2714&rft_id=info:doi/10.3390%2Fmath12172714&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2227-7390&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2227-7390&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2227-7390&client=summon