Category attention guided network for semantic segmentation of Fine-Resolution remote sensing images

•We propose a category attention (CA) for differences between RS image pixels.•We design CAGNet by integrating the CA into CNN and Transformer layers.•Finally, the experiments show that our network achieves state-of-the-art results. The semantic segmentation task is an essential issue in various fie...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of applied earth observation and geoinformation Vol. 127; p. 103661
Main Authors:	Wang, Shunli, Hu, Qingwu, Wang, Shaohua, Zhao, Pengcheng, Li, Jiayuan, Ai, Mingyao
Format:	Journal Article
Language:	English
Published:	Elsevier B.V 01-03-2024 Elsevier
Subjects:	Category attention CNN Remote sensing images Semantic segmentation Transformer CNN Remote sensing images Transformer Semantic segmentation Category attention
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•We propose a category attention (CA) for differences between RS image pixels.•We design CAGNet by integrating the CA into CNN and Transformer layers.•Finally, the experiments show that our network achieves state-of-the-art results. The semantic segmentation task is an essential issue in various fields, including land cover classification and cultural heritage investigation. The CNN and Transformer have been widely utilized in semantic segmentation tasks due to notable advancements in deep learning technologies. However, these methodologies may not fully account for remote sensing images' distinctive attributes, including the large intra-class variation and the small inter-class variation. Driven by it, we propose a category attention guided network (CAGNet). Initially, a local feature extraction module is devised to cater to striped objects and features at different scales. Then, we propose a novel concept of category attention for remote sensing images as a feature representation of category differences between pixels. Meanwhile, we designed the Transformer-based and CNN-based category attention guided modules to integrate the proposed category attention into the global scoring functions and local category feature weights, respectively. The network is designed to give more attention to the category features by updating these weights during the training process. Finally, a feature fusion module is developed to integrate global, local, and category multi-scale features and contextual information. A series of extensive experiments along with ablation studies on the UAVid, Vaihingen, and Potsdam datasets indicate that our network outperforms existing methods, including those based on CNN and Transformer.
ISSN:	1569-8432 1872-826X
DOI:	10.1016/j.jag.2024.103661