Learning Visual Hierarchies with Hyperbolic Embeddings
Structuring latent representations in a hierarchical manner enables models to learn patterns at multiple levels of abstraction. However, most prevalent image understanding models focus on visual similarity, and learning visual hierarchies is relatively unexplored. In this work, for the first time, w...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
26-11-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Structuring latent representations in a hierarchical manner enables models to
learn patterns at multiple levels of abstraction. However, most prevalent image
understanding models focus on visual similarity, and learning visual
hierarchies is relatively unexplored. In this work, for the first time, we
introduce a learning paradigm that can encode user-defined multi-level visual
hierarchies in hyperbolic space without requiring explicit hierarchical labels.
As a concrete example, first, we define a part-based image hierarchy using
object-level annotations within and across images. Then, we introduce an
approach to enforce the hierarchy using contrastive loss with pairwise
entailment metrics. Finally, we discuss new evaluation metrics to effectively
measure hierarchical image retrieval. Encoding these complex relationships
ensures that the learned representations capture semantic and structural
information that transcends mere visual similarity. Experiments in part-based
image retrieval show significant improvements in hierarchical retrieval tasks,
demonstrating the capability of our model in capturing visual hierarchies. |
---|---|
DOI: | 10.48550/arxiv.2411.17490 |