Do Game Data Generalize Well for Remote Sensing Image Segmentation?

Despite the recent progress in deep learning and remote sensing image interpretation, the adaption of a deep learning model between different sources of remote sensing data still remains a challenge. This paper investigates an interesting question: do synthetic data generalize well for remote sensin...

Full description

Saved in:

Bibliographic Details
Published in:	Remote sensing (Basel, Switzerland) Vol. 12; no. 2; p. 275
Main Authors:	Zou, Zhengxia, Shi, Tianyang, Li, Wenyuan, Zhang, Zhou, Shi, Zhenwei
Format:	Journal Article
Language:	English
Published:	Basel MDPI AG 01-01-2020
Subjects:	Adaptability Adaptation building segmentation Computer & video games Datasets Deep learning domain adaptation Domains Image processing Image segmentation Investigations Methods Neural networks Public buildings Questions Realism Remote sensing Semantic segmentation Semantics Theft Training Translation video game
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Despite the recent progress in deep learning and remote sensing image interpretation, the adaption of a deep learning model between different sources of remote sensing data still remains a challenge. This paper investigates an interesting question: do synthetic data generalize well for remote sensing image applications? To answer this question, we take the building segmentation as an example by training a deep learning model on the city map of a well-known video game “Grand Theft Auto V” and then adapting the model to real-world remote sensing images. We propose a generative adversarial training based segmentation framework to improve the adaptability of the segmentation model. Our model consists of a CycleGAN model and a ResNet based segmentation network, where the former one is a well-known image-to-image translation framework which learns a mapping of the image from the game domain to the remote sensing domain; and the latter one learns to predict pixel-wise building masks based on the transformed data. All models in our method can be trained in an end-to-end fashion. The segmentation model can be trained without using any additional ground truth reference of the real-world images. Experimental results on a public building segmentation dataset suggest the effectiveness of our adaptation method. Our method shows superiority over other state-of-the-art semantic segmentation methods, for example, Deeplab-v3 and UNet. Another advantage of our method is that by introducing semantic information to the image-to-image translation framework, the image style conversion can be further improved.
ISSN:	2072-4292 2072-4292
DOI:	10.3390/rs12020275