DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting
Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
09-03-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | Recent end-to-end scene text spotters have achieved great improvement in
recognizing arbitrary-shaped text instances. Common approaches for text
spotting use region of interest pooling or segmentation masks to restrict
features to single text instances. However, this makes it hard for the
recognizer to decode correct sequences when the detection is not accurate i.e.
one or more characters are cropped out. Considering that it is hard to
accurately decide word boundaries with only the detector, we propose a novel
Detection-agnostic End-to-End Recognizer, DEER, framework. The proposed method
reduces the tight dependency between detection and recognition modules by
bridging them with a single reference point for each text instance, instead of
using detected regions. The proposed method allows the decoder to recognize the
texts that are indicated by the reference point, with features from the whole
image. Since only a single point is required to recognize the text, the
proposed method enables text spotting without an arbitrarily-shaped detector or
bounding polygon annotations. Experimental results present that the proposed
method achieves competitive results on regular and arbitrarily-shaped text
spotting benchmarks. Further analysis shows that DEER is robust to the
detection errors. The code and dataset will be publicly available. |
---|---|
AbstractList | Recent end-to-end scene text spotters have achieved great improvement in
recognizing arbitrary-shaped text instances. Common approaches for text
spotting use region of interest pooling or segmentation masks to restrict
features to single text instances. However, this makes it hard for the
recognizer to decode correct sequences when the detection is not accurate i.e.
one or more characters are cropped out. Considering that it is hard to
accurately decide word boundaries with only the detector, we propose a novel
Detection-agnostic End-to-End Recognizer, DEER, framework. The proposed method
reduces the tight dependency between detection and recognition modules by
bridging them with a single reference point for each text instance, instead of
using detected regions. The proposed method allows the decoder to recognize the
texts that are indicated by the reference point, with features from the whole
image. Since only a single point is required to recognize the text, the
proposed method enables text spotting without an arbitrarily-shaped detector or
bounding polygon annotations. Experimental results present that the proposed
method achieves competitive results on regular and arbitrarily-shaped text
spotting benchmarks. Further analysis shows that DEER is robust to the
detection errors. The code and dataset will be publicly available. |
Author | Kil, Taeho Shin, Seung Cho, Han-Cheol Kim, Yoonsik Park, Seunghyun Kim, Seonghyeon Baek, Youngmin Surh, Jaeheung Lee, Bado |
Author_xml | – sequence: 1 givenname: Seonghyeon surname: Kim fullname: Kim, Seonghyeon – sequence: 2 givenname: Seung surname: Shin fullname: Shin, Seung – sequence: 3 givenname: Yoonsik surname: Kim fullname: Kim, Yoonsik – sequence: 4 givenname: Han-Cheol surname: Cho fullname: Cho, Han-Cheol – sequence: 5 givenname: Taeho surname: Kil fullname: Kil, Taeho – sequence: 6 givenname: Jaeheung surname: Surh fullname: Surh, Jaeheung – sequence: 7 givenname: Seunghyun surname: Park fullname: Park, Seunghyun – sequence: 8 givenname: Bado surname: Lee fullname: Lee, Bado – sequence: 9 givenname: Youngmin surname: Baek fullname: Baek, Youngmin |
BackLink | https://doi.org/10.48550/arXiv.2203.05122$$DView paper in arXiv |
BookMark | eNotz71OwzAUQGEPMEDhAZjwCzj4J44dNtSmFKkSUps9urGvI0tgV6mFCk8PFKazHem7JhcpJyTkTvCqtlrzB5hP8aOSkquKayHlFVmvum73SFdY0JWYE4Mp5WOJjnbJs5LZT-gOXZ5S_MKZhjzTvcOEtMdToftDLiWm6YZcBng74u1_F6Rfd_1yw7avzy_Lpy2DxkgmtLda4OhRte1og7DgghgtQjNKJ3itQtNq6Yy2wRkewJvaAUATTBt87dWC3P9tz5DhMMd3mD-HX9BwBqlvC-1Hrw |
ContentType | Journal Article |
Copyright | http://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: http://creativecommons.org/licenses/by/4.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2203.05122 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2203_05122 |
GroupedDBID | AKY GOX |
ID | FETCH-LOGICAL-a672-15d851ebde399b8f18acf1b8ea6b2c1043f6952c758fc70fad74caaa6f79fd4d3 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:41:43 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a672-15d851ebde399b8f18acf1b8ea6b2c1043f6952c758fc70fad74caaa6f79fd4d3 |
OpenAccessLink | https://arxiv.org/abs/2203.05122 |
ParticipantIDs | arxiv_primary_2203_05122 |
PublicationCentury | 2000 |
PublicationDate | 2022-03-09 |
PublicationDateYYYYMMDD | 2022-03-09 |
PublicationDate_xml | – month: 03 year: 2022 text: 2022-03-09 day: 09 |
PublicationDecade | 2020 |
PublicationYear | 2022 |
Score | 1.8378562 |
SecondaryResourceType | preprint |
Snippet | Recent end-to-end scene text spotters have achieved great improvement in
recognizing arbitrary-shaped text instances. Common approaches for text
spotting use... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Computer Vision and Pattern Recognition |
Title | DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting |
URI | https://arxiv.org/abs/2203.05122 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwED3RTiwIBKh8ygOrReOktsOGSEonkNoO3arzl8SSVmmLEL--Z6cIFiZL9i1-lu6e7edngAeURlqlkYfcBV5IW_By6DNeanShzFFJG48GJjP1ttBVHW1y2M9bGGy_Pj47f2CzeRQiGpBSTaIk2xMiSrZe3xfd5WSy4jrE_8YRx0xdf4rE-BRODuyOPXfLcQZHvjmHcVXX0ydW-W2SPTU8idsogtWN49sVp4ZNOyHPt28Z0Ug2s5SD2JwyJ5utV0mbfAHzcT1_mfDD9wUcpRI8GzliM944TxzA6JBptCEz2hM8wtIuKA-yHAlLhD1YNQzoVGERUQZVBle4_BL6zarxA2CKSCXVfYy_ghVi6AxqqVxufSaJsvjRFQzSpJfrzqFiGfFYJjyu_x-6gWMRtfxRUFXeQn_b7vwd9DZud59g3gOrDXsj |
link.rule.ids | 228,230,782,887 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DEER%3A+Detection-agnostic+End-to-End+Recognizer+for+Scene+Text+Spotting&rft.au=Kim%2C+Seonghyeon&rft.au=Shin%2C+Seung&rft.au=Kim%2C+Yoonsik&rft.au=Cho%2C+Han-Cheol&rft.date=2022-03-09&rft_id=info:doi/10.48550%2Farxiv.2203.05122&rft.externalDocID=2203_05122 |