DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting

Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kim, Seonghyeon, Shin, Seung, Kim, Yoonsik, Cho, Han-Cheol, Kil, Taeho, Surh, Jaeheung, Park, Seunghyun, Lee, Bado, Baek, Youngmin
Format:	Journal Article
Language:	English
Published:	09-03-2022
Subjects:	Computer Science - Computer Vision and Pattern Recognition
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer to decode correct sequences when the detection is not accurate i.e. one or more characters are cropped out. Considering that it is hard to accurately decide word boundaries with only the detector, we propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework. The proposed method reduces the tight dependency between detection and recognition modules by bridging them with a single reference point for each text instance, instead of using detected regions. The proposed method allows the decoder to recognize the texts that are indicated by the reference point, with features from the whole image. Since only a single point is required to recognize the text, the proposed method enables text spotting without an arbitrarily-shaped detector or bounding polygon annotations. Experimental results present that the proposed method achieves competitive results on regular and arbitrarily-shaped text spotting benchmarks. Further analysis shows that DEER is robust to the detection errors. The code and dataset will be publicly available.
AbstractList	Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer to decode correct sequences when the detection is not accurate i.e. one or more characters are cropped out. Considering that it is hard to accurately decide word boundaries with only the detector, we propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework. The proposed method reduces the tight dependency between detection and recognition modules by bridging them with a single reference point for each text instance, instead of using detected regions. The proposed method allows the decoder to recognize the texts that are indicated by the reference point, with features from the whole image. Since only a single point is required to recognize the text, the proposed method enables text spotting without an arbitrarily-shaped detector or bounding polygon annotations. Experimental results present that the proposed method achieves competitive results on regular and arbitrarily-shaped text spotting benchmarks. Further analysis shows that DEER is robust to the detection errors. The code and dataset will be publicly available.
Author	Kil, Taeho Shin, Seung Cho, Han-Cheol Kim, Yoonsik Park, Seunghyun Kim, Seonghyeon Baek, Youngmin Surh, Jaeheung Lee, Bado
Author_xml	– sequence: 1 givenname: Seonghyeon surname: Kim fullname: Kim, Seonghyeon – sequence: 2 givenname: Seung surname: Shin fullname: Shin, Seung – sequence: 3 givenname: Yoonsik surname: Kim fullname: Kim, Yoonsik – sequence: 4 givenname: Han-Cheol surname: Cho fullname: Cho, Han-Cheol – sequence: 5 givenname: Taeho surname: Kil fullname: Kil, Taeho – sequence: 6 givenname: Jaeheung surname: Surh fullname: Surh, Jaeheung – sequence: 7 givenname: Seunghyun surname: Park fullname: Park, Seunghyun – sequence: 8 givenname: Bado surname: Lee fullname: Lee, Bado – sequence: 9 givenname: Youngmin surname: Baek fullname: Baek, Youngmin
BackLink	https://doi.org/10.48550/arXiv.2203.05122$$DView paper in arXiv
BookMark	eNotz71OwzAUQGEPMEDhAZjwCzj4J44dNtSmFKkSUps9urGvI0tgV6mFCk8PFKazHem7JhcpJyTkTvCqtlrzB5hP8aOSkquKayHlFVmvum73SFdY0JWYE4Mp5WOJjnbJs5LZT-gOXZ5S_MKZhjzTvcOEtMdToftDLiWm6YZcBng74u1_F6Rfd_1yw7avzy_Lpy2DxkgmtLda4OhRte1og7DgghgtQjNKJ3itQtNq6Yy2wRkewJvaAUATTBt87dWC3P9tz5DhMMd3mD-HX9BwBqlvC-1Hrw
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by/4.0
DBID	AKY GOX
DOI	10.48550/arxiv.2203.05122
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2203_05122
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a672-15d851ebde399b8f18acf1b8ea6b2c1043f6952c758fc70fad74caaa6f79fd4d3
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:41:43 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a672-15d851ebde399b8f18acf1b8ea6b2c1043f6952c758fc70fad74caaa6f79fd4d3
OpenAccessLink	https://arxiv.org/abs/2203.05122
ParticipantIDs	arxiv_primary_2203_05122
PublicationCentury	2000
PublicationDate	2022-03-09
PublicationDateYYYYMMDD	2022-03-09
PublicationDate_xml	– month: 03 year: 2022 text: 2022-03-09 day: 09
PublicationDecade	2020
PublicationYear	2022
Score	1.8378562
SecondaryResourceType	preprint
Snippet	Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Computer Vision and Pattern Recognition
Title	DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting
URI	https://arxiv.org/abs/2203.05122
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwED3RTiwIBKh8ygOrReOktsOGSEonkNoO3arzl8SSVmmLEL--Z6cIFiZL9i1-lu6e7edngAeURlqlkYfcBV5IW_By6DNeanShzFFJG48GJjP1ttBVHW1y2M9bGGy_Pj47f2CzeRQiGpBSTaIk2xMiSrZe3xfd5WSy4jrE_8YRx0xdf4rE-BRODuyOPXfLcQZHvjmHcVXX0ydW-W2SPTU8idsogtWN49sVp4ZNOyHPt28Z0Ug2s5SD2JwyJ5utV0mbfAHzcT1_mfDD9wUcpRI8GzliM944TxzA6JBptCEz2hM8wtIuKA-yHAlLhD1YNQzoVGERUQZVBle4_BL6zarxA2CKSCXVfYy_ghVi6AxqqVxufSaJsvjRFQzSpJfrzqFiGfFYJjyu_x-6gWMRtfxRUFXeQn_b7vwd9DZud59g3gOrDXsj
link.rule.ids	228,230,782,887
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=DEER%3A+Detection-agnostic+End-to-End+Recognizer+for+Scene+Text+Spotting&rft.au=Kim%2C+Seonghyeon&rft.au=Shin%2C+Seung&rft.au=Kim%2C+Yoonsik&rft.au=Cho%2C+Han-Cheol&rft.date=2022-03-09&rft_id=info:doi/10.48550%2Farxiv.2203.05122&rft.externalDocID=2203_05122