iSEA: An Interactive Pipeline for Semantic Error Analysis of NLP Models

Error analysis in NLP models is essential to successful model development and deployment. One common approach for diagnosing errors is to identify subpopulations in the dataset where the model produces the most errors. However, existing approaches typically define subpopulations based on pre-defined...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuan, Jun, Vig, Jesse, Rajani, Nazneen
Format: Journal Article
Language:English
Published: 08-03-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Error analysis in NLP models is essential to successful model development and deployment. One common approach for diagnosing errors is to identify subpopulations in the dataset where the model produces the most errors. However, existing approaches typically define subpopulations based on pre-defined features, which requires users to form hypotheses of errors in advance. To complement these approaches, we propose iSEA, an Interactive Pipeline for Semantic Error Analysis in NLP Models, which automatically discovers semantically-grounded subpopulations with high error rates in the context of a human-in-the-loop interactive system. iSEA enables model developers to learn more about their model errors through discovered subpopulations, validate the sources of errors through interactive analysis on the discovered subpopulations, and test hypotheses about model errors by defining custom subpopulations. The tool supports semantic descriptions of error-prone subpopulations at the token and concept level, as well as pre-defined higher-level features. Through use cases and expert interviews, we demonstrate how iSEA can assist error understanding and analysis.
AbstractList Error analysis in NLP models is essential to successful model development and deployment. One common approach for diagnosing errors is to identify subpopulations in the dataset where the model produces the most errors. However, existing approaches typically define subpopulations based on pre-defined features, which requires users to form hypotheses of errors in advance. To complement these approaches, we propose iSEA, an Interactive Pipeline for Semantic Error Analysis in NLP Models, which automatically discovers semantically-grounded subpopulations with high error rates in the context of a human-in-the-loop interactive system. iSEA enables model developers to learn more about their model errors through discovered subpopulations, validate the sources of errors through interactive analysis on the discovered subpopulations, and test hypotheses about model errors by defining custom subpopulations. The tool supports semantic descriptions of error-prone subpopulations at the token and concept level, as well as pre-defined higher-level features. Through use cases and expert interviews, we demonstrate how iSEA can assist error understanding and analysis.
Author Vig, Jesse
Yuan, Jun
Rajani, Nazneen
Author_xml – sequence: 1
  givenname: Jun
  surname: Yuan
  fullname: Yuan, Jun
– sequence: 2
  givenname: Jesse
  surname: Vig
  fullname: Vig, Jesse
– sequence: 3
  givenname: Nazneen
  surname: Rajani
  fullname: Rajani, Nazneen
BackLink https://doi.org/10.1145/3490099.3511146$$DView published paper (Access to full text may be restricted)
https://doi.org/10.48550/arXiv.2203.04408$$DView paper in arXiv
BookMark eNotz71OwzAYhWEPMEDhApjwDST4N_7MFlWhVApQqd2jL7EjWUqdyokqeve0pdPRuxzpeSR3cYyekBfOcgVaszdMv-GYC8FkzpRi8EBWYVuV77SMdB1nn7Cbw9HTTTj4IURP-zHRrd9jnENHq5TOWUYcTlOY6NjT73pDv0bnh-mJ3Pc4TP75tguy-6h2y8-s_lmtl2WdoRYs40qBBdYKIwrdmdY5WzAh0XLlpAMJ3gKHzgAaEEz1RmjfCq0KDs4iFHJBXv9vr5LmkMIe06m5iJqrSP4BhZNFIA
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2203.04408
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2203_04408
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a520-1448980b27265c7bdd96023a914d3d838e9818c78a78204f725eb254618d9a863
IEDL.DBID GOX
IngestDate Mon Jan 08 05:41:42 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a520-1448980b27265c7bdd96023a914d3d838e9818c78a78204f725eb254618d9a863
OpenAccessLink https://arxiv.org/abs/2203.04408
ParticipantIDs arxiv_primary_2203_04408
PublicationCentury 2000
PublicationDate 2022-03-08
PublicationDateYYYYMMDD 2022-03-08
PublicationDate_xml – month: 03
  year: 2022
  text: 2022-03-08
  day: 08
PublicationDecade 2020
PublicationYear 2022
Score 1.8380283
SecondaryResourceType preprint
Snippet Error analysis in NLP models is essential to successful model development and deployment. One common approach for diagnosing errors is to identify...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computation and Language
Computer Science - Human-Computer Interaction
Title iSEA: An Interactive Pipeline for Semantic Error Analysis of NLP Models
URI https://arxiv.org/abs/2203.04408
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LT8MwDLbYTlwQCNB4KgeuFVuatA63CbrtgMak7rDblCapVAm6qWWIn4-TboILxzwUyY4U-3PszwAPJVm5QmgdJWiQAIqUkRJaRM5YqTiSD-J8HHKWp_MVvmSeJocdamF08119dfzARfvIuScg9U2Re9Dj3KdsTd9W3edkoOLa7__dRz5mmPpjJCancLL37ti4u44zOHL1OUyrPBs_sXHNQvhNhxeGLaqtLwV3jLxGlrsPkrAyLGsaGh6YQtimZPPXBfMNy97bC1hOsuXzLNr3L4i0JFBGUAUVDgue8kSatLCW0AKPtRoJG1uM0SmyliZF7TnrRJlySTBXimSEVmlM4kvo15vaDYAVBZ2B_ghhhEHUvhNLqbQaJtZyO7qCQZB6ve0oKtZeIeugkOv_l27gmPtkfp9RhbfQ_2x27g56rd3dBz3_AF3YeKU
link.rule.ids 228,230,782,887
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=iSEA%3A+An+Interactive+Pipeline+for+Semantic+Error+Analysis+of+NLP+Models&rft.au=Yuan%2C+Jun&rft.au=Vig%2C+Jesse&rft.au=Rajani%2C+Nazneen&rft.date=2022-03-08&rft_id=info:doi/10.48550%2Farxiv.2203.04408&rft.externalDocID=2203_04408