DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting

Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer...

Full description

Saved in:
Bibliographic Details
Main Authors: Kim, Seonghyeon, Shin, Seung, Kim, Yoonsik, Cho, Han-Cheol, Kil, Taeho, Surh, Jaeheung, Park, Seunghyun, Lee, Bado, Baek, Youngmin
Format: Journal Article
Language:English
Published: 09-03-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer to decode correct sequences when the detection is not accurate i.e. one or more characters are cropped out. Considering that it is hard to accurately decide word boundaries with only the detector, we propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework. The proposed method reduces the tight dependency between detection and recognition modules by bridging them with a single reference point for each text instance, instead of using detected regions. The proposed method allows the decoder to recognize the texts that are indicated by the reference point, with features from the whole image. Since only a single point is required to recognize the text, the proposed method enables text spotting without an arbitrarily-shaped detector or bounding polygon annotations. Experimental results present that the proposed method achieves competitive results on regular and arbitrarily-shaped text spotting benchmarks. Further analysis shows that DEER is robust to the detection errors. The code and dataset will be publicly available.
AbstractList Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer to decode correct sequences when the detection is not accurate i.e. one or more characters are cropped out. Considering that it is hard to accurately decide word boundaries with only the detector, we propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework. The proposed method reduces the tight dependency between detection and recognition modules by bridging them with a single reference point for each text instance, instead of using detected regions. The proposed method allows the decoder to recognize the texts that are indicated by the reference point, with features from the whole image. Since only a single point is required to recognize the text, the proposed method enables text spotting without an arbitrarily-shaped detector or bounding polygon annotations. Experimental results present that the proposed method achieves competitive results on regular and arbitrarily-shaped text spotting benchmarks. Further analysis shows that DEER is robust to the detection errors. The code and dataset will be publicly available.
Author Kil, Taeho
Shin, Seung
Cho, Han-Cheol
Kim, Yoonsik
Park, Seunghyun
Kim, Seonghyeon
Baek, Youngmin
Surh, Jaeheung
Lee, Bado
Author_xml – sequence: 1
  givenname: Seonghyeon
  surname: Kim
  fullname: Kim, Seonghyeon
– sequence: 2
  givenname: Seung
  surname: Shin
  fullname: Shin, Seung
– sequence: 3
  givenname: Yoonsik
  surname: Kim
  fullname: Kim, Yoonsik
– sequence: 4
  givenname: Han-Cheol
  surname: Cho
  fullname: Cho, Han-Cheol
– sequence: 5
  givenname: Taeho
  surname: Kil
  fullname: Kil, Taeho
– sequence: 6
  givenname: Jaeheung
  surname: Surh
  fullname: Surh, Jaeheung
– sequence: 7
  givenname: Seunghyun
  surname: Park
  fullname: Park, Seunghyun
– sequence: 8
  givenname: Bado
  surname: Lee
  fullname: Lee, Bado
– sequence: 9
  givenname: Youngmin
  surname: Baek
  fullname: Baek, Youngmin
BackLink https://doi.org/10.48550/arXiv.2203.05122$$DView paper in arXiv
BookMark eNotz71OwzAUQGEPMEDhAZjwCzj4J44dNtSmFKkSUps9urGvI0tgV6mFCk8PFKazHem7JhcpJyTkTvCqtlrzB5hP8aOSkquKayHlFVmvum73SFdY0JWYE4Mp5WOJjnbJs5LZT-gOXZ5S_MKZhjzTvcOEtMdToftDLiWm6YZcBng74u1_F6Rfd_1yw7avzy_Lpy2DxkgmtLda4OhRte1og7DgghgtQjNKJ3itQtNq6Yy2wRkewJvaAUATTBt87dWC3P9tz5DhMMd3mD-HX9BwBqlvC-1Hrw
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID AKY
GOX
DOI 10.48550/arxiv.2203.05122
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2203_05122
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a672-15d851ebde399b8f18acf1b8ea6b2c1043f6952c758fc70fad74caaa6f79fd4d3
IEDL.DBID GOX
IngestDate Mon Jan 08 05:41:43 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a672-15d851ebde399b8f18acf1b8ea6b2c1043f6952c758fc70fad74caaa6f79fd4d3
OpenAccessLink https://arxiv.org/abs/2203.05122
ParticipantIDs arxiv_primary_2203_05122
PublicationCentury 2000
PublicationDate 2022-03-09
PublicationDateYYYYMMDD 2022-03-09
PublicationDate_xml – month: 03
  year: 2022
  text: 2022-03-09
  day: 09
PublicationDecade 2020
PublicationYear 2022
Score 1.8378562
SecondaryResourceType preprint
Snippet Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computer Vision and Pattern Recognition
Title DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting
URI https://arxiv.org/abs/2203.05122
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwED3RTiwIBKh8ygOrReOktsOGSEonkNoO3arzl8SSVmmLEL--Z6cIFiZL9i1-lu6e7edngAeURlqlkYfcBV5IW_By6DNeanShzFFJG48GJjP1ttBVHW1y2M9bGGy_Pj47f2CzeRQiGpBSTaIk2xMiSrZe3xfd5WSy4jrE_8YRx0xdf4rE-BRODuyOPXfLcQZHvjmHcVXX0ydW-W2SPTU8idsogtWN49sVp4ZNOyHPt28Z0Ug2s5SD2JwyJ5utV0mbfAHzcT1_mfDD9wUcpRI8GzliM944TxzA6JBptCEz2hM8wtIuKA-yHAlLhD1YNQzoVGERUQZVBle4_BL6zarxA2CKSCXVfYy_ghVi6AxqqVxufSaJsvjRFQzSpJfrzqFiGfFYJjyu_x-6gWMRtfxRUFXeQn_b7vwd9DZud59g3gOrDXsj
link.rule.ids 228,230,782,887
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DEER%3A+Detection-agnostic+End-to-End+Recognizer+for+Scene+Text+Spotting&rft.au=Kim%2C+Seonghyeon&rft.au=Shin%2C+Seung&rft.au=Kim%2C+Yoonsik&rft.au=Cho%2C+Han-Cheol&rft.date=2022-03-09&rft_id=info:doi/10.48550%2Farxiv.2203.05122&rft.externalDocID=2203_05122