Deep Learning Architectures for 2D and 3D Scene Perception

Scene understanding is a fundamental problem in computer vision tasks, that is being more intensively explored in recent years with the development of deep learning. In this dissertation, we proposed deep learning structures to address challenges in 2D and 3D scene perception. We developed several n...

Full description

Saved in:
Bibliographic Details
Main Author: Bao, Rina
Format: Dissertation
Language:English
Published: ProQuest Dissertations & Theses 01-01-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Scene understanding is a fundamental problem in computer vision tasks, that is being more intensively explored in recent years with the development of deep learning. In this dissertation, we proposed deep learning structures to address challenges in 2D and 3D scene perception. We developed several novel architectures for 3D point cloud understanding at city-scale point by effectively capturing both long-range and short-range information to handle the challenging problem of large variations in object size for city-scale point cloud segmentation. GLSNet++ is a two-branch network for multiscale point cloud segmentation that models this complex problem using both global and local processing streams to capture different levels of contextual and structural 3D point cloud information. We developed PointGrad, a new graph convolution gradient operator for capturing structural relationships, that encoded point-based directional gradients into a high-dimensional multiscale tensor space. Using the PointGrad operator with graph convolution on scattered irregular point sets captures the salient structural information in the point cloud across spatial and feature scale space, enabling efficient learning. We integrated PointGrad with several deep network architectures for large-scale 3D point cloud semantic segmentation, including indoor scene and object part segmentation. In many real application areas including remote sensing and aerial imaging, the class imbalance is common and sufficient data for rare classes is hard to acquire or has high-cost associated with expert labeling. We developed MDXNet for few-shot and zero-shot learning, which emulates the human visual system by leveraging multi-domain knowledge from general visual primitives with transfer learning for more specialized learning tasks in various application domains. We extended deep learning methods in various domains, including the material domain for predicting carbon nanotube forest attributes and mechanical properties, biomedical domain for cell segmentation.
ISBN:9798841786917