Region-based adaptive association learning for robust image scene recognition

Scene recognition is challenging due to the complicated spatial arrangement and varied object distribution inside the scene images. Deep learning methods, especially convolutional neural networks (CNNs), have boosted the performance of scene recognition significantly because of the multistage global...

Full description

Saved in:
Bibliographic Details
Published in:The Visual computer Vol. 39; no. 4; pp. 1629 - 1649
Main Authors: Lv, Guangrui, Dong, Lili, Zhang, Wenwen, Xu, Wenhai
Format: Journal Article
Language:English
Published: Berlin/Heidelberg Springer Berlin Heidelberg 01-04-2023
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Scene recognition is challenging due to the complicated spatial arrangement and varied object distribution inside the scene images. Deep learning methods, especially convolutional neural networks (CNNs), have boosted the performance of scene recognition significantly because of the multistage global feature learning ability. However, the CNNs ignore the local semantics and diverse relationships among local regions, which play an important role in scene recognition. In this paper, a region-based adaptive association learning framework is proposed to tackle these challenges. This framework contains two sub-networks to separately extract the features of semantic distribution and contextual arrangement, and deep fusion networks to fuse such features in a joint and boosting fashion. Firstly, we simplify the current CNN structure so that our feature extractor can maintain high-level spatial region representation ability. Then, we propose a novel regional correlation learning architecture to explore the arbitrary direction dependence of the region and the four directions’ long-range strip context dependence of the region in different spatial dimensions. Moreover, we design different attention strategies which include context gating, recurrent attention, and gated attention to further refine the extracted regional relationships and highlight local semantic information. Finally, the collaborative relationship features are deeply fused and then fed into a classification layer so that the whole framework is optimized as a whole. Extensive experimental evaluations on both ocean scene and land scene datasets coming from some different fields show that the proposed method achieves better accurate prediction on scene recognition than other state-of-the-art models.
ISSN:0178-2789
1432-2315
DOI:10.1007/s00371-022-02433-1