Towards unified on-road object detection and depth estimation from a single image

On-road object detection based on convolutional neural network (CNN) is an important problem in the field of automatic driving. However, traditional 2D object detection aims to accomplish object classification and location in image space, lacking the ability to acquire the depth information. Besides...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of machine learning and cybernetics Vol. 13; no. 5; pp. 1231 - 1241
Main Authors:	Lian, Guofei, Wang, Yan, Qin, Huabiao, Chen, Guancheng
Format:	Journal Article
Language:	English
Published:	Berlin/Heidelberg Springer Berlin Heidelberg 01-05-2022 Springer Nature B.V
Subjects:	Accuracy Algorithms Artificial Intelligence Artificial neural networks Complex Systems Computational Intelligence Control Deep learning Engineering Image acquisition Machine learning Mechatronics Methods Object recognition Original Article Pattern Recognition Roads Roads & highways Robotics Semantics Systems Biology Telematics Vehicles Monocular image YOLOv3 Depth estimation Convolution neural network On-road object detection
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	On-road object detection based on convolutional neural network (CNN) is an important problem in the field of automatic driving. However, traditional 2D object detection aims to accomplish object classification and location in image space, lacking the ability to acquire the depth information. Besides, it is inefficient to cascade the object detection and monocular depth estimation network for realizing 2.5D object detection. To address this problem, we propose a unified multi-task learning mechanism of object detection and depth estimation. Firstly, we propose an innovative loss function, namely projective consistency loss, which uses the perspective projection principle to model the transformation relationship between the target size and the depth value. Therefore, the object detection task and the depth estimation task can be mutually constrained. Then, we propose a global multi-scale feature extracting scheme by combining the Global Context (GC) and Atrous Spatial Pyramid Pooling (ASPP) block in an appropriate way, which can promote effective feature learning and collaborative learning between object detection and depth estimation. Comprehensive experiments conducted on KITTI and Cityscapes dataset show that our approach achieves high mAP and low distance estimation error, outperforming other state-of-the-art methods.
ISSN:	1868-8071 1868-808X
DOI:	10.1007/s13042-021-01444-z