A RGB-D feature fusion network for occluded object 6D pose estimation

6D pose estimation using RGB-D data has been widely utilized in various scenarios, with keypoint-based methods receiving significant attention due to their exceptional performance. However, these methods still face numerous challenges, especially when the object is heavily occluded or truncated. To...

Full description

Saved in:
Bibliographic Details
Published in:Signal, image and video processing Vol. 18; no. 8-9; pp. 6309 - 6319
Main Authors: Song, Yiwei, Tang, Chunhui
Format: Journal Article
Language:English
Published: London Springer London 01-09-2024
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:6D pose estimation using RGB-D data has been widely utilized in various scenarios, with keypoint-based methods receiving significant attention due to their exceptional performance. However, these methods still face numerous challenges, especially when the object is heavily occluded or truncated. To address this issue, we propose a novel cross-modal fusion network. Specifically, our approach initially employs object detection to identify the potential position of the object and randomly samples within this region. Subsequently, a specially designed feature extraction network is utilized to extract appearance features from the RGB image and geometry features from the depth image respectively; these features are then implicitly aggregated through cross-modal fusion. Finally, keypoints are employed for estimating the pose of the object. The proposed method undergoes extensive testing on Occlusion Linemod and Truncation Linemod datasets. Experimental results demonstrate that our method has made significant advancements, thereby validating the effectiveness of cross-modal feature fusion strategy in enhancing the accuracy of RGB-D image pose estimation based on keypoints.
ISSN:1863-1703
1863-1711
DOI:10.1007/s11760-024-03318-7