MOVES: Motion-Oriented VidEo Sampling for Natural Language-Based Vehicle Retrieval
Retrieving the target vehicle through natural language descriptions plays a crucial role in intelligent transportation systems. Existing methods tackle this task by employing models that leverage the correlation between textual and visual representations, such as CLIP. However, these models struggle...
Saved in:
Published in: | 2024 IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) pp. 1 - 7 |
---|---|
Main Authors: | , , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
15-07-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Retrieving the target vehicle through natural language descriptions plays a crucial role in intelligent transportation systems. Existing methods tackle this task by employing models that leverage the correlation between textual and visual representations, such as CLIP. However, these models struggle to capture the temporal characteristics of video data, and researchers enhance temporal understanding performance through various data augmentation and video encoders. Yet, conventional approaches in previous studies often overlook the detailed temporal characteristics of vehicles. To overcome this limitation, we introduce a MOVES: Motion-Oriented VidEo Sampling method to effectively utilize the motion information of the target vehicle. Furthermore, we construct a robust model by implementing a re-ranking algorithm to address a variety of vehicle attributes. As a result, our proposed model achieves state-of-the-art performance on the public vehicle retrieval dataset. |
---|---|
ISSN: | 2643-6213 |
DOI: | 10.1109/AVSS61716.2024.10672583 |