SEMPANet: A Modified Path Aggregation Network with Squeeze-Excitation for Scene Text Detection

Recently, various object detection frameworks have been applied to text detection tasks and have achieved good performance in the final detection. With the further expansion of text detection application scenarios, the research value of text detection topics has gradually increased. Text detection i...

Full description

Saved in:

Bibliographic Details
Published in:	Sensors (Basel, Switzerland) Vol. 21; no. 8; p. 2657
Main Authors:	Li, Shuangshuang, Cao, Wenming
Format:	Journal Article
Language:	English
Published:	Switzerland MDPI AG 09-04-2021 MDPI
Subjects:	Accuracy Algorithms Boxes Deep learning feature fusion Methods natural scene Quadrilaterals R&D Research & development Sensors text detection natural scene feature fusion text detection
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recently, various object detection frameworks have been applied to text detection tasks and have achieved good performance in the final detection. With the further expansion of text detection application scenarios, the research value of text detection topics has gradually increased. Text detection in natural scenes is more challenging for horizontal text based on a quadrilateral detection box and for curved text of any shape. Most networks have a good effect on the balancing of target samples in text detection, but it is challenging to deal with small targets and solve extremely unbalanced data. We continued to use PSENet to deal with such problems in this work. On the other hand, we studied the problem that most of the existing scene text detection methods use ResNet and FPN as the backbone of feature extraction, and improved the ResNet and FPN network parts of PSENet to make it more conducive to the combination of feature extraction in the early stage. A SEMPANet framework without an anchor and in one stage is proposed to implement a lightweight model, which is embodied in the training time of about 24 h. Finally, we selected the two most representative datasets for oriented text and curved text to conduct experiments. On ICDAR2015, the improved network's latest results further verify its effectiveness; it reached 1.01% in F-measure compared with PSENet-1s. On CTW1500, the improved network performed better than the original network on average.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s21082657