Category attention guided network for semantic segmentation of Fine-Resolution remote sensing images
•We propose a category attention (CA) for differences between RS image pixels.•We design CAGNet by integrating the CA into CNN and Transformer layers.•Finally, the experiments show that our network achieves state-of-the-art results. The semantic segmentation task is an essential issue in various fie...
Saved in:
Published in: | International journal of applied earth observation and geoinformation Vol. 127; p. 103661 |
---|---|
Main Authors: | , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Elsevier B.V
01-03-2024
Elsevier |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •We propose a category attention (CA) for differences between RS image pixels.•We design CAGNet by integrating the CA into CNN and Transformer layers.•Finally, the experiments show that our network achieves state-of-the-art results.
The semantic segmentation task is an essential issue in various fields, including land cover classification and cultural heritage investigation. The CNN and Transformer have been widely utilized in semantic segmentation tasks due to notable advancements in deep learning technologies. However, these methodologies may not fully account for remote sensing images' distinctive attributes, including the large intra-class variation and the small inter-class variation. Driven by it, we propose a category attention guided network (CAGNet). Initially, a local feature extraction module is devised to cater to striped objects and features at different scales. Then, we propose a novel concept of category attention for remote sensing images as a feature representation of category differences between pixels. Meanwhile, we designed the Transformer-based and CNN-based category attention guided modules to integrate the proposed category attention into the global scoring functions and local category feature weights, respectively. The network is designed to give more attention to the category features by updating these weights during the training process. Finally, a feature fusion module is developed to integrate global, local, and category multi-scale features and contextual information. A series of extensive experiments along with ablation studies on the UAVid, Vaihingen, and Potsdam datasets indicate that our network outperforms existing methods, including those based on CNN and Transformer. |
---|---|
ISSN: | 1569-8432 1872-826X |
DOI: | 10.1016/j.jag.2024.103661 |