Adversarial Attacks on Foundational Vision Models
Rapid progress is being made in developing large, pretrained, task-agnostic foundational vision models such as CLIP, ALIGN, DINOv2, etc. In fact, we are approaching the point where these models do not have to be finetuned downstream, and can simply be used in zero-shot or with a lightweight probing...
Saved in:
Main Authors: | , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
28-08-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Rapid progress is being made in developing large, pretrained, task-agnostic
foundational vision models such as CLIP, ALIGN, DINOv2, etc. In fact, we are
approaching the point where these models do not have to be finetuned
downstream, and can simply be used in zero-shot or with a lightweight probing
head. Critically, given the complexity of working at this scale, there is a
bottleneck where relatively few organizations in the world are executing the
training then sharing the models on centralized platforms such as HuggingFace
and torch.hub. The goal of this work is to identify several key adversarial
vulnerabilities of these models in an effort to make future designs more
robust. Intuitively, our attacks manipulate deep feature representations to
fool an out-of-distribution (OOD) detector which will be required when using
these open-world-aware models to solve closed-set downstream tasks. Our methods
reliably make in-distribution (ID) images (w.r.t. a downstream task) be
predicted as OOD and vice versa while existing in extremely
low-knowledge-assumption threat models. We show our attacks to be potent in
whitebox and blackbox settings, as well as when transferred across foundational
model types (e.g., attack DINOv2 with CLIP)! This work is only just the
beginning of a long journey towards adversarially robust foundational vision
models. |
---|---|
DOI: | 10.48550/arxiv.2308.14597 |