A RGB-D feature fusion network for occluded object 6D pose estimation
6D pose estimation using RGB-D data has been widely utilized in various scenarios, with keypoint-based methods receiving significant attention due to their exceptional performance. However, these methods still face numerous challenges, especially when the object is heavily occluded or truncated. To...
Saved in:
Published in: | Signal, image and video processing Vol. 18; no. 8-9; pp. 6309 - 6319 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
London
Springer London
01-09-2024
Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | 6D pose estimation using RGB-D data has been widely utilized in various scenarios, with keypoint-based methods receiving significant attention due to their exceptional performance. However, these methods still face numerous challenges, especially when the object is heavily occluded or truncated. To address this issue, we propose a novel cross-modal fusion network. Specifically, our approach initially employs object detection to identify the potential position of the object and randomly samples within this region. Subsequently, a specially designed feature extraction network is utilized to extract appearance features from the RGB image and geometry features from the depth image respectively; these features are then implicitly aggregated through cross-modal fusion. Finally, keypoints are employed for estimating the pose of the object. The proposed method undergoes extensive testing on Occlusion Linemod and Truncation Linemod datasets. Experimental results demonstrate that our method has made significant advancements, thereby validating the effectiveness of cross-modal feature fusion strategy in enhancing the accuracy of RGB-D image pose estimation based on keypoints. |
---|---|
ISSN: | 1863-1703 1863-1711 |
DOI: | 10.1007/s11760-024-03318-7 |