Attention-Based Grasp Detection With Monocular Depth Estimation

Grasp detection plays a pivotal role in robotic manipulation, allowing robots to interact with and manipulate objects in their surroundings. Traditionally, this has relied on three-dimensional (3D) point cloud data acquired from specialized depth cameras. However, the limited availability of such se...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 12; pp. 65041 - 65057
Main Authors: Xuan Tan, Phan, Hoang, Dinh-Cuong, Nguyen, Anh-Nhat, Nguyen, Van-Thiep, Vu, Van-Duc, Nguyen, Thu-Uyen, Hoang, Ngoc-Anh, Phan, Khanh-Toan, Tran, Duc-Thanh, Vu, Duy-Quang, Ngo, Phuc-Quan, Duong, Quang-Tri, Ho, Ngoc-Trung, Tran, Cong-Trinh, Duong, Van-Hiep, Mai, Anh-Truong
Format: Journal Article
Language:English
Published: Piscataway IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Grasp detection plays a pivotal role in robotic manipulation, allowing robots to interact with and manipulate objects in their surroundings. Traditionally, this has relied on three-dimensional (3D) point cloud data acquired from specialized depth cameras. However, the limited availability of such sensors in real-world scenarios poses a significant challenge. In many practical applications, robots operate in diverse environments where obtaining high-quality 3D point cloud data may be impractical or impossible. This paper introduces an innovative approach to grasp generation using color images, thereby eliminating the need for dedicated depth sensors. Our method capitalizes on advanced deep learning techniques for depth estimation directly from color images. Instead of relying on conventional depth sensors, our approach computes predicted point clouds based on estimated depth images derived directly from Red-Green-Blue (RGB) input data. To our knowledge, this is the first study to explore the use of predicted depth data for grasp detection, moving away from the traditional dependence on depth sensors. The novelty of this work is the development of a fusion module that seamlessly integrates features extracted from RGB images with those inferred from the predicted point clouds. Additionally, we adapt a voting mechanism from our previous work (VoteGrasp) to enhance robustness to occlusion and generate collision-free grasps. Experimental evaluations conducted on standard datasets validate the effectiveness of our approach, demonstrating its superior performance in generating grasp configurations compared to existing methods. With our proposed method, we achieved a significant 4% improvement in average precision compared to state-of-the-art grasp detection methods. Furthermore, our method demonstrates promising practical viability through real robot grasping experiments, achieving an impressive 84% success rate.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3397718