Relation-aware aggregation network with auxiliary guidance for text-based person search

In this paper, we propose a novel Relation-aware Aggregation Network with Auxiliary Guidance for text-based person search, namely RANAG. Existing works are still hard to capture the detailed appearance of a person and compute the similarity between images and texts. RANAN is designed to address the...

Full description

Saved in:
Bibliographic Details
Published in:World wide web (Bussum) Vol. 25; no. 4; pp. 1565 - 1582
Main Authors: Zeng, Pengpeng, Jing, Shuaiqi, Song, Jingkuan, Fan, Kaixuan, Li, Xiangpeng, We, Liansuo, Guo, Yuan
Format: Journal Article
Language:English
Published: New York Springer US 01-07-2022
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we propose a novel Relation-aware Aggregation Network with Auxiliary Guidance for text-based person search, namely RANAG. Existing works are still hard to capture the detailed appearance of a person and compute the similarity between images and texts. RANAN is designed to address the above problem from two aspects: relation-aware visual and additional auxiliary signal. Specifically, we introduce a Relation-aware Aggregation Network (RAN) that exploits the relation between the person and local objects. Then, we propose three auxiliary tasks to acquire additional knowledge of semantic representations. Each task has a respective objective: identifying the gender of the pedestrian in the image, distinguishing the images of the similar pedestrian, and aligning the semantic information between description and image. In addition, some data augmentation methods we explored can further improve the performance. Extensive experiments demonstrate that our model provides superior performance than the state-of-the-art methods on the CUHK-PEDES dataset.
ISSN:1386-145X
1573-1413
DOI:10.1007/s11280-021-00953-9