Visual Perception for Robotic Spatial Understanding

Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechani...

Full description

Saved in:
Bibliographic Details
Main Author: Owens, Jason
Format: Dissertation
Language:English
Published: ProQuest Dissertations & Theses 01-01-2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don't have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don't yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet.
AbstractList Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don't have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don't yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet.
Author Owens, Jason
Author_xml – sequence: 1
  givenname: Jason
  surname: Owens
  fullname: Owens, Jason
BookMark eNqNiksKwjAUAAMq-MsdAq6FZ2JMuxbFpah1W9I2lUh5rybp_e3CA8gsZjGzZFMkdBPGc5PtINP6YLSRc8Zj9BUA5ErBXi6Yevo42E5cXahdnzyhaCmIG1WUfC3uvU1-zAU2LsRksfH4WrNZa7vo-M8rtjmfHsfLtg_0GVxM5ZuGgGMqpTQjoAyo_64vKdY3PQ
ContentType Dissertation
Copyright Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Copyright_xml – notice: Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
DBID 053
0BH
0OR
CBPLH
EU9
G20
M8-
PQEST
PQQKQ
PQUKI
DatabaseName Dissertations & Theses Europe Full Text: Science & Technology
ProQuest Dissertations and Theses Professional
Dissertations & Theses @ University of Pennsylvania
ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection
ProQuest Dissertations & Theses A&I
ProQuest Dissertations & Theses Global
ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
DatabaseTitle Dissertations & Theses Europe Full Text: Science & Technology
Dissertations & Theses @ University of Pennsylvania
ProQuest One Academic UKI Edition
ProQuest One Academic Eastern Edition
ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection
ProQuest Dissertations and Theses Professional
ProQuest One Academic
ProQuest Dissertations & Theses A&I
ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection
ProQuest Dissertations & Theses Global
DatabaseTitleList Dissertations & Theses Europe Full Text: Science & Technology
Database_xml – sequence: 1
  dbid: G20
  name: ProQuest Dissertations & Theses Global
  url: https://www.proquest.com/pqdtglobal1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Genre Dissertation/Thesis
GroupedDBID 053
0BH
0OR
8R4
8R5
CBPLH
EU9
G20
M8-
PQEST
PQQKQ
PQUKI
Q2X
ID FETCH-proquest_journals_22727203703
IEDL.DBID G20
ISBN 9781085567572
1085567575
IngestDate Thu Oct 10 15:43:55 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-proquest_journals_22727203703
PQID 2272720370
PQPubID 18750
ParticipantIDs proquest_journals_2272720370
PublicationCentury 2000
PublicationDate 20190101
PublicationDateYYYYMMDD 2019-01-01
PublicationDate_xml – month: 01
  year: 2019
  text: 20190101
  day: 01
PublicationDecade 2010
PublicationYear 2019
Publisher ProQuest Dissertations & Theses
Publisher_xml – name: ProQuest Dissertations & Theses
SSID ssib000933042
Score 3.7390678
Snippet Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Artificial intelligence
Computer science
Robotics
Title Visual Perception for Robotic Spatial Understanding
URI https://www.proquest.com/docview/2272720370
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsUxJNUq1MEjWNbJIM9EF1hCJuklp5qa6SYZmZhZplhYpyeB93B7B5n4RFi6uoGNy7GB7YUDLKmFlIrigTslPBo2R6xsZQaYMzQ3sCwp1QbdGgWZXoVdoMDOwGhqaWYKW9LkjN3_gvXXQGntTYNvY3BR6zBOMb4RRBoMrFjcBSp0kyMDjgjSjLsTAlJonzCAAu6tBAZp1RRiMwzKLSxNzFALgC1kUgM1VhaD8pHxg0lEA3UwMTIkKoch7XUQZlN1cQ5w9dGHuioemvOJ4hKOMxRhY8vLzUiUYFMyNkowSLYGVUQqw9WBham6ZmpacCrr_NcUY2ABITZVkkMFnkhR-aWkGLqAplpCBCRkGlpKi0lRZBubilFI5cHwAAHdpl_Q
link.rule.ids 312,782,786,787,11655,11695,34254,34256,44056,74579,79427
linkProvider ProQuest
linkToHtml http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsUxJNUq1MEjWNbJIM9EF1hCJuklp5qa6SYZmZhZplhYpyeB93B7B5n4RFi6uoGNy7GB7YUDLKmFlIrigTslPBo2R6xsZQaYMzQ3sCwp1QbdGgWZXoVdoMDOwmgBbHqAU7o7c_IH31kFr7E2BbWNzU-gxTzC-EUYZDK5Y3AQodZIgA48L0oy6EANTap4wgwDsrgYFaNYVYTAOyywuTcxRCIAvZFEANlcVgvKT8oFJRwF0MzEwJSqEIu91EWVQdnMNcfbQhbkrHpryiuMRjjIWY2DJy89LlWBQMDdKMkq0BFZGKcDWg4WpuWVqWnIq6P7XFGNgAyA1VZJBBp9JUvil5Rk4PUJ8feJ9PP28pRm4gCZaQgYpZBhYSopKU2UZmItTSuXAcQMAtaCa2w
linkToPdf http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsUxJNUq1MEjWNbJIM9EF1hCJuklp5qa6SYZmZhZplhYpyeB93B7B5n4RFi6uoGNyvGB7YUDLKmFlIrigTslPBo2R6xsZQaYMzQ3006DLIgJc3OwLCnVBN0iBZlqh12kwM7Cam5iag1K4O3JTCN5zB623NwW2k81NoUc-wfhGGOUxuJJxE6Cm8wQZeFyQZtqFGJhS84QZBGB3OChAs7QIg3FYZnFpYo5CAHyBiwKwGasQlJ-UD0xSCqAbi4EpVCEUeQ-MKIOym2uIs4cuzI3x0BRZHI9woLEYA0tefl6qBIOCuVGSUaIlsJJKAbYqLICBlpqWnAq6FzbFGNgwSE2VZJDBZ5IUfml5Bg5gGMT7ePp5SzNwAQ20hIxdyDCwlBSVpsoyMBenlMqBowkAG3ejpg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Visual+Perception+for+Robotic+Spatial+Understanding&rft.DBID=053%3B0BH%3B0OR%3BCBPLH%3BEU9%3BG20%3BM8-%3BPQEST%3BPQQKQ%3BPQUKI&rft.PQPubID=18750&rft.au=Owens%2C+Jason&rft.date=2019-01-01&rft.pub=ProQuest+Dissertations+%26+Theses&rft.isbn=9781085567572&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781085567572/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781085567572/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781085567572/sc.gif&client=summon&freeimage=true