Perceptual Monocular Depth Estimation

Monocular depth estimation (MDE), which is the task of using a single image to predict scene depths, has gained considerable interest, in large part owing to the popularity of applying deep learning methods to solve “computer vision problems”. Monocular cues provide sufficient data for humans to ins...

Full description

Saved in:

Bibliographic Details
Published in:	Neural processing letters Vol. 53; no. 2; pp. 1205 - 1228
Main Authors:	Pan, Janice, Bovik, Alan C.
Format:	Journal Article
Language:	English
Published:	New York Springer US 01-04-2021 Springer Nature B.V
Subjects:	Artificial Intelligence Bivariate analysis Complex Systems Computational Intelligence Computer Science Computer vision Deep learning Dictionaries Image processing Image quality Quality assessment Statistics Video Bivariate natural scene statistics Natural scene statistics Monocular depth estimation Depth estimation Neural networks
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Monocular depth estimation (MDE), which is the task of using a single image to predict scene depths, has gained considerable interest, in large part owing to the popularity of applying deep learning methods to solve “computer vision problems”. Monocular cues provide sufficient data for humans to instantaneously extract an understanding of scene geometries and relative depths, which is evidence of both the processing power of the human visual system and the predictive power of the monocular data. However, developing computational models to predict depth from monocular images remains challenging. Hand-designed MDE features do not perform particularly well, and even current “deep” models are still evolving. Here we propose a novel approach that uses perceptually-relevant natural scene statistics (NSS) features to predict depths from monocular images in a simple, scale-agnostic way that is competitive with state-of-the-art systems. While the statistics of natural photographic images have been successfully used in a variety of image and video processing, analysis, and quality assessment tasks, they have never been applied in a predictive end-to-end deep-learning model for monocular depth. Correspondingly, no previous work has explicitly incorporated perceptual features in a monocular depth-prediction approach. Here we accomplish this by developing a new closed-form bivariate model of image luminances and use features extracted from this model and from other NSS models to drive a novel deep learning framework for predicting depth given a single image.
ISSN:	1370-4621 1573-773X
DOI:	10.1007/s11063-021-10437-6