Relation-aware aggregation network with auxiliary guidance for text-based person search

In this paper, we propose a novel Relation-aware Aggregation Network with Auxiliary Guidance for text-based person search, namely RANAG. Existing works are still hard to capture the detailed appearance of a person and compute the similarity between images and texts. RANAN is designed to address the...

Full description

Saved in:

Bibliographic Details
Published in:	World wide web (Bussum) Vol. 25; no. 4; pp. 1565 - 1582
Main Authors:	Zeng, Pengpeng, Jing, Shuaiqi, Song, Jingkuan, Fan, Kaixuan, Li, Xiangpeng, We, Liansuo, Guo, Yuan
Format:	Journal Article
Language:	English
Published:	New York Springer US 01-07-2022 Springer Nature B.V
Subjects:	Agglomeration Alliances Backpacks Classification Computer Science Database Management Datasets Gender Information Systems Applications (incl.Internet) Knowledge representation Neural networks Operating Systems Performance enhancement Semantics Special Issue on Synthetic Media on the Web Visual aspects Visual signals World Wide Web Matching Auxiliary reasoning Relation-aware Person search
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we propose a novel Relation-aware Aggregation Network with Auxiliary Guidance for text-based person search, namely RANAG. Existing works are still hard to capture the detailed appearance of a person and compute the similarity between images and texts. RANAN is designed to address the above problem from two aspects: relation-aware visual and additional auxiliary signal. Specifically, we introduce a Relation-aware Aggregation Network (RAN) that exploits the relation between the person and local objects. Then, we propose three auxiliary tasks to acquire additional knowledge of semantic representations. Each task has a respective objective: identifying the gender of the pedestrian in the image, distinguishing the images of the similar pedestrian, and aligning the semantic information between description and image. In addition, some data augmentation methods we explored can further improve the performance. Extensive experiments demonstrate that our model provides superior performance than the state-of-the-art methods on the CUHK-PEDES dataset.
ISSN:	1386-145X 1573-1413
DOI:	10.1007/s11280-021-00953-9