Visual Perception for Robotic Spatial Understanding

Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechani...

Full description

Saved in:

Bibliographic Details
Main Author:	Owens, Jason
Format:	Dissertation
Language:	English
Published:	ProQuest Dissertations & Theses 01-01-2019
Subjects:	Artificial intelligence Computer science Robotics
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don't have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don't yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet.
AbstractList	Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don't have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don't yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet.
Author	Owens, Jason
Author_xml	– sequence: 1 givenname: Jason surname: Owens fullname: Owens, Jason
BookMark	eNqNiksKwjAUAAMq-MsdAq6FZ2JMuxbFpah1W9I2lUh5rybp_e3CA8gsZjGzZFMkdBPGc5PtINP6YLSRc8Zj9BUA5ErBXi6Yevo42E5cXahdnzyhaCmIG1WUfC3uvU1-zAU2LsRksfH4WrNZa7vo-M8rtjmfHsfLtg_0GVxM5ZuGgGMqpTQjoAyo_64vKdY3PQ
ContentType	Dissertation
Copyright	Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Copyright_xml	– notice: Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
DBID	053 0BH 0OR CBPLH EU9 G20 M8- PQEST PQQKQ PQUKI
DatabaseName	Dissertations & Theses Europe Full Text: Science & Technology ProQuest Dissertations and Theses Professional Dissertations & Theses @ University of Pennsylvania ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations & Theses A&I ProQuest Dissertations & Theses Global ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition
DatabaseTitle	Dissertations & Theses Europe Full Text: Science & Technology Dissertations & Theses @ University of Pennsylvania ProQuest One Academic UKI Edition ProQuest One Academic Eastern Edition ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations and Theses Professional ProQuest One Academic ProQuest Dissertations & Theses A&I ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection ProQuest Dissertations & Theses Global
DatabaseTitleList	Dissertations & Theses Europe Full Text: Science & Technology
Database_xml	– sequence: 1 dbid: G20 name: ProQuest Dissertations & Theses Global url: https://www.proquest.com/pqdtglobal1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
Genre	Dissertation/Thesis
GroupedDBID	053 0BH 0OR 8R4 8R5 CBPLH EU9 G20 M8- PQEST PQQKQ PQUKI Q2X
ID	FETCH-proquest_journals_22727203703
IEDL.DBID	G20
ISBN	9781085567572 1085567575
IngestDate	Thu Oct 10 15:43:55 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-proquest_journals_22727203703
PQID	2272720370
PQPubID	18750
ParticipantIDs	proquest_journals_2272720370
PublicationCentury	2000
PublicationDate	20190101
PublicationDateYYYYMMDD	2019-01-01
PublicationDate_xml	– month: 01 year: 2019 text: 20190101 day: 01
PublicationDecade	2010
PublicationYear	2019
Publisher	ProQuest Dissertations & Theses
Publisher_xml	– name: ProQuest Dissertations & Theses
SSID	ssib000933042
Score	3.7390678
Snippet	Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct...
SourceID	proquest
SourceType	Aggregation Database
SubjectTerms	Artificial intelligence Computer science Robotics
Title	Visual Perception for Robotic Spatial Understanding
URI	https://www.proquest.com/docview/2272720370
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsUxJNUq1MEjWNbJIM9EF1hCJuklp5qa6SYZmZhZplhYpyeB93B7B5n4RFi6uoGNy7GB7YUDLKmFlIrigTslPBo2R6xsZQaYMzQ3sCwp1QbdGgWZXoVdoMDOwGhqaWYKW9LkjN3_gvXXQGntTYNvY3BR6zBOMb4RRBoMrFjcBSp0kyMDjgjSjLsTAlJonzCAAu6tBAZp1RRiMwzKLSxNzFALgC1kUgM1VhaD8pHxg0lEA3UwMTIkKoch7XUQZlN1cQ5w9dGHuioemvOJ4hKOMxRhY8vLzUiUYFMyNkowSLYGVUQqw9WBham6ZmpacCrr_NcUY2ABITZVkkMFnkhR-aWkGLqAplpCBCRkGlpKi0lRZBubilFI5cHwAAHdpl_Q
link.rule.ids	312,782,786,787,11655,11695,34254,34256,44056,74579,79427
linkProvider	ProQuest
linkToHtml	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsUxJNUq1MEjWNbJIM9EF1hCJuklp5qa6SYZmZhZplhYpyeB93B7B5n4RFi6uoGNy7GB7YUDLKmFlIrigTslPBo2R6xsZQaYMzQ3sCwp1QbdGgWZXoVdoMDOwmgBbHqAU7o7c_IH31kFr7E2BbWNzU-gxTzC-EUYZDK5Y3AQodZIgA48L0oy6EANTap4wgwDsrgYFaNYVYTAOyywuTcxRCIAvZFEANlcVgvKT8oFJRwF0MzEwJSqEIu91EWVQdnMNcfbQhbkrHpryiuMRjjIWY2DJy89LlWBQMDdKMkq0BFZGKcDWg4WpuWVqWnIq6P7XFGNgAyA1VZJBBp9JUvil5Rk4PUJ8feJ9PP28pRm4gCZaQgYpZBhYSopKU2UZmItTSuXAcQMAtaCa2w
linkToPdf	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsUxJNUq1MEjWNbJIM9EF1hCJuklp5qa6SYZmZhZplhYpyeB93B7B5n4RFi6uoGNyvGB7YUDLKmFlIrigTslPBo2R6xsZQaYMzQ3006DLIgJc3OwLCnVBN0iBZlqh12kwM7Cam5iag1K4O3JTCN5zB623NwW2k81NoUc-wfhGGOUxuJJxE6Cm8wQZeFyQZtqFGJhS84QZBGB3OChAs7QIg3FYZnFpYo5CAHyBiwKwGasQlJ-UD0xSCqAbi4EpVCEUeQ-MKIOym2uIs4cuzI3x0BRZHI9woLEYA0tefl6qBIOCuVGSUaIlsJJKAbYqLICBlpqWnAq6FzbFGNgwSE2VZJDBZ5IUfml5Bg5gGMT7ePp5SzNwAQ20hIxdyDCwlBSVpsoyMBenlMqBowkAG3ejpg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Visual+Perception+for+Robotic+Spatial+Understanding&rft.DBID=053%3B0BH%3B0OR%3BCBPLH%3BEU9%3BG20%3BM8-%3BPQEST%3BPQQKQ%3BPQUKI&rft.PQPubID=18750&rft.au=Owens%2C+Jason&rft.date=2019-01-01&rft.pub=ProQuest+Dissertations+%26+Theses&rft.isbn=9781085567572&rft.externalDBID=HAS_PDF_LINK
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781085567572/lc.gif&client=summon&freeimage=true
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781085567572/mc.gif&client=summon&freeimage=true
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781085567572/sc.gif&client=summon&freeimage=true