Visual Perception for Robotic Spatial Understanding
Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechani...
Saved in:
Main Author: | |
---|---|
Format: | Dissertation |
Language: | English |
Published: |
ProQuest Dissertations & Theses
01-01-2019
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don't have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don't yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet. |
---|---|
AbstractList | Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don't have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don't yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet. |
Author | Owens, Jason |
Author_xml | – sequence: 1 givenname: Jason surname: Owens fullname: Owens, Jason |
BookMark | eNqNiksKwjAUAAMq-MsdAq6FZ2JMuxbFpah1W9I2lUh5rybp_e3CA8gsZjGzZFMkdBPGc5PtINP6YLSRc8Zj9BUA5ErBXi6Yevo42E5cXahdnzyhaCmIG1WUfC3uvU1-zAU2LsRksfH4WrNZa7vo-M8rtjmfHsfLtg_0GVxM5ZuGgGMqpTQjoAyo_64vKdY3PQ |
ContentType | Dissertation |
Copyright | Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works. |
Copyright_xml | – notice: Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works. |
DBID | 053 0BH 0OR CBPLH EU9 G20 M8- PQEST PQQKQ PQUKI |
DatabaseName | Dissertations & Theses Europe Full Text: Science & Technology ProQuest Dissertations and Theses Professional Dissertations & Theses @ University of Pennsylvania ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations & Theses A&I ProQuest Dissertations & Theses Global ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition |
DatabaseTitle | Dissertations & Theses Europe Full Text: Science & Technology Dissertations & Theses @ University of Pennsylvania ProQuest One Academic UKI Edition ProQuest One Academic Eastern Edition ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations and Theses Professional ProQuest One Academic ProQuest Dissertations & Theses A&I ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection ProQuest Dissertations & Theses Global |
DatabaseTitleList | Dissertations & Theses Europe Full Text: Science & Technology |
Database_xml | – sequence: 1 dbid: G20 name: ProQuest Dissertations & Theses Global url: https://www.proquest.com/pqdtglobal1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
Genre | Dissertation/Thesis |
GroupedDBID | 053 0BH 0OR 8R4 8R5 CBPLH EU9 G20 M8- PQEST PQQKQ PQUKI Q2X |
ID | FETCH-proquest_journals_22727203703 |
IEDL.DBID | G20 |
ISBN | 9781085567572 1085567575 |
IngestDate | Thu Oct 10 15:43:55 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_22727203703 |
PQID | 2272720370 |
PQPubID | 18750 |
ParticipantIDs | proquest_journals_2272720370 |
PublicationCentury | 2000 |
PublicationDate | 20190101 |
PublicationDateYYYYMMDD | 2019-01-01 |
PublicationDate_xml | – month: 01 year: 2019 text: 20190101 day: 01 |
PublicationDecade | 2010 |
PublicationYear | 2019 |
Publisher | ProQuest Dissertations & Theses |
Publisher_xml | – name: ProQuest Dissertations & Theses |
SSID | ssib000933042 |
Score | 3.7390678 |
Snippet | Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Artificial intelligence Computer science Robotics |
Title | Visual Perception for Robotic Spatial Understanding |
URI | https://www.proquest.com/docview/2272720370 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsUxJNUq1MEjWNbJIM9EF1hCJuklp5qa6SYZmZhZplhYpyeB93B7B5n4RFi6uoGNy7GB7YUDLKmFlIrigTslPBo2R6xsZQaYMzQ3sCwp1QbdGgWZXoVdoMDOwGhqaWYKW9LkjN3_gvXXQGntTYNvY3BR6zBOMb4RRBoMrFjcBSp0kyMDjgjSjLsTAlJonzCAAu6tBAZp1RRiMwzKLSxNzFALgC1kUgM1VhaD8pHxg0lEA3UwMTIkKoch7XUQZlN1cQ5w9dGHuioemvOJ4hKOMxRhY8vLzUiUYFMyNkowSLYGVUQqw9WBham6ZmpacCrr_NcUY2ABITZVkkMFnkhR-aWkGLqAplpCBCRkGlpKi0lRZBubilFI5cHwAAHdpl_Q |
link.rule.ids | 312,782,786,787,11655,11695,34254,34256,44056,74579,79427 |
linkProvider | ProQuest |
linkToHtml | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsUxJNUq1MEjWNbJIM9EF1hCJuklp5qa6SYZmZhZplhYpyeB93B7B5n4RFi6uoGNy7GB7YUDLKmFlIrigTslPBo2R6xsZQaYMzQ3sCwp1QbdGgWZXoVdoMDOwmgBbHqAU7o7c_IH31kFr7E2BbWNzU-gxTzC-EUYZDK5Y3AQodZIgA48L0oy6EANTap4wgwDsrgYFaNYVYTAOyywuTcxRCIAvZFEANlcVgvKT8oFJRwF0MzEwJSqEIu91EWVQdnMNcfbQhbkrHpryiuMRjjIWY2DJy89LlWBQMDdKMkq0BFZGKcDWg4WpuWVqWnIq6P7XFGNgAyA1VZJBBp9JUvil5Rk4PUJ8feJ9PP28pRm4gCZaQgYpZBhYSopKU2UZmItTSuXAcQMAtaCa2w |
linkToPdf | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsUxJNUq1MEjWNbJIM9EF1hCJuklp5qa6SYZmZhZplhYpyeB93B7B5n4RFi6uoGNyvGB7YUDLKmFlIrigTslPBo2R6xsZQaYMzQ3006DLIgJc3OwLCnVBN0iBZlqh12kwM7Cam5iag1K4O3JTCN5zB623NwW2k81NoUc-wfhGGOUxuJJxE6Cm8wQZeFyQZtqFGJhS84QZBGB3OChAs7QIg3FYZnFpYo5CAHyBiwKwGasQlJ-UD0xSCqAbi4EpVCEUeQ-MKIOym2uIs4cuzI3x0BRZHI9woLEYA0tefl6qBIOCuVGSUaIlsJJKAbYqLICBlpqWnAq6FzbFGNgwSE2VZJDBZ5IUfml5Bg5gGMT7ePp5SzNwAQ20hIxdyDCwlBSVpsoyMBenlMqBowkAG3ejpg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Visual+Perception+for+Robotic+Spatial+Understanding&rft.DBID=053%3B0BH%3B0OR%3BCBPLH%3BEU9%3BG20%3BM8-%3BPQEST%3BPQQKQ%3BPQUKI&rft.PQPubID=18750&rft.au=Owens%2C+Jason&rft.date=2019-01-01&rft.pub=ProQuest+Dissertations+%26+Theses&rft.isbn=9781085567572&rft.externalDBID=HAS_PDF_LINK |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781085567572/lc.gif&client=summon&freeimage=true |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781085567572/mc.gif&client=summon&freeimage=true |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781085567572/sc.gif&client=summon&freeimage=true |