PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection

Vision Transformers (ViTs) have achieved remarkable success over various vision tasks, yet their robustness against data distribution shifts and inherent inductive biases remain underexplored. To enhance the robustness of ViT models for image Out-of-Distribution (OOD) detection, we introduce a novel...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhang, Tianhao, Chen, Zhixiang, Mihaylova, Lyudmila S
Format:	Journal Article
Language:	English
Published:	27-10-2024
Subjects:	Computer Science - Computer Vision and Pattern Recognition
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Vision Transformers (ViTs) have achieved remarkable success over various vision tasks, yet their robustness against data distribution shifts and inherent inductive biases remain underexplored. To enhance the robustness of ViT models for image Out-of-Distribution (OOD) detection, we introduce a novel and generic framework named Prior-augmented Vision Transformer (PViT). PViT identifies OOD samples by quantifying the divergence between the predicted class logits and the prior logits obtained from pre-trained models. Unlike existing state-of-the-art OOD detection methods, PViT shapes the decision boundary between ID and OOD by utilizing the proposed prior guide confidence, without requiring additional data modeling, generation methods, or structural modifications. Extensive experiments on the large-scale ImageNet benchmark demonstrate that PViT significantly outperforms existing state-of-the-art OOD detection methods. Additionally, through comprehensive analyses, ablation studies, and discussions, we show how PViT can strategically address specific challenges in managing large vision models, paving the way for new advancements in OOD detection.
DOI:	10.48550/arxiv.2410.20631