Are These the Same Apple? Comparing Images Based on Object Intrinsics
The human visual system can effortlessly recognize an object under different extrinsic factors such as lighting, object poses, and background, yet current computer vision systems often struggle with these variations. An important step to understanding and improving artificial vision systems is to me...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
01-11-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The human visual system can effortlessly recognize an object under different
extrinsic factors such as lighting, object poses, and background, yet current
computer vision systems often struggle with these variations. An important step
to understanding and improving artificial vision systems is to measure image
similarity purely based on intrinsic object properties that define object
identity. This problem has been studied in the computer vision literature as
re-identification, though mostly restricted to specific object categories such
as people and cars. We propose to extend it to general object categories,
exploring an image similarity metric based on object intrinsics. To benchmark
such measurements, we collect the Common paired objects Under differenT
Extrinsics (CUTE) dataset of $18,000$ images of $180$ objects under different
extrinsic factors such as lighting, poses, and imaging conditions. While
existing methods such as LPIPS and CLIP scores do not measure object intrinsics
well, we find that combining deep features learned from contrastive
self-supervised learning with foreground filtering is a simple yet effective
approach to approximating the similarity. We conduct an extensive survey of
pre-trained features and foreground extraction methods to arrive at a strong
baseline that best measures intrinsic object-centric image similarity among
current methods. Finally, we demonstrate that our approach can aid in
downstream applications such as acting as an analog for human subjects and
improving generalizable re-identification. Please see our project website at
https://s-tian.github.io/projects/cute/ for visualizations of the data and
demos of our metric. |
---|---|
DOI: | 10.48550/arxiv.2311.00750 |