Illicit object detection in X-ray images using Vision Transformers
Illicit object detection is a critical task performed at various high-security locations, including airports, train stations, subways, and ports. The continuous and tedious work of examining thousands of X-ray images per hour can be mentally taxing. Thus, Deep Neural Networks (DNNs) can be used to a...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
27-03-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Illicit object detection is a critical task performed at various
high-security locations, including airports, train stations, subways, and
ports. The continuous and tedious work of examining thousands of X-ray images
per hour can be mentally taxing. Thus, Deep Neural Networks (DNNs) can be used
to automate the X-ray image analysis process, improve efficiency and alleviate
the security officers' inspection burden. The neural architectures typically
utilized in relevant literature are Convolutional Neural Networks (CNNs), with
Vision Transformers (ViTs) rarely employed. In order to address this gap, this
paper conducts a comprehensive evaluation of relevant ViT architectures on
illicit item detection in X-ray images. This study utilizes both Transformer
and hybrid backbones, such as SWIN and NextViT, and detectors, such as DINO and
RT-DETR. The results demonstrate the remarkable accuracy of the DINO
Transformer detector in the low-data regime, the impressive real-time
performance of YOLOv8, and the effectiveness of the hybrid NextViT backbone. |
---|---|
DOI: | 10.48550/arxiv.2403.19043 |