A Multiple-Instance Densely-Connected ConvNet for Aerial Scene Classification

In contrast with nature scenes, aerial scenes are often composed of many objects crowdedly distributed on the surface in bird's view, the description of which usually demands more discriminative features as well as local semantics. However, when applied to scene classification, most of the exis...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on image processing Vol. 29; p. 1
Main Authors:	Bi, Qi, Qin, Kun, Li, Zhili, Zhang, Han, Xu, Kai, Xia, Gui-Song
Format:	Journal Article
Language:	English
Published:	United States IEEE 01-01-2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	aerial image Artificial neural networks Classification Classifiers Convolution convolutional neural network dense connection Feature extraction Learning Machine learning multiple instance learning Neural networks Scene classification Semantics Task analysis Training Visualization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In contrast with nature scenes, aerial scenes are often composed of many objects crowdedly distributed on the surface in bird's view, the description of which usually demands more discriminative features as well as local semantics. However, when applied to scene classification, most of the existing convolution neural networks (ConvNets) tend to depict global semantics of images, and the loss of low- and mid-level features can hardly be avoided, especially when the model goes deeper. To tackle these challenges, in this paper, we propose a multiple-instance densely-connected ConvNet (MIDC-Net) for aerial scene classification. It regards aerial scene classification as a multiple-instance learning problem so that local semantics can be further investigated. Our classification model consists of an instance-level classifier, a multiple instance pooling and followed by a bag-level classification layer. In the instance-level classifier, we propose a simplified dense connection structure to effectively preserve features from different levels. The extracted convolution features are further converted into instance feature vectors. Then, we propose a trainable attention-based multiple instance pooling. It highlights the local semantics relevant to the scene label and outputs the bag-level probability directly. Finally, with our bag-level classification layer, this multiple instance learning framework is under the direct supervision of bag labels. Experiments on three widely-utilized aerial scene benchmarks demonstrate that our proposed method outperforms many state-of-the-art methods by a large margin with much fewer parameters.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1057-7149 1941-0042
DOI:	10.1109/TIP.2020.2975718