Deep correspondence restricted Boltzmann machine for cross-modal retrieval

The task of cross-modal retrieval, i.e., using a text query to search for images or vice versa, has received considerable attention with the rapid growth of multi-modal web data. Modeling the correlations between different modalities is the key to tackle this problem. In this paper, we propose a cor...

Full description

Saved in:

Bibliographic Details
Published in:	Neurocomputing (Amsterdam) Vol. 154; pp. 50 - 60
Main Authors:	Feng, Fangxiang, Li, Ruifan, Wang, Xiaojie
Format:	Journal Article
Language:	English
Published:	Elsevier B.V 22-04-2015
Subjects:	Cross-modal Deep Learning Multi-modal RBM Retrieval Retrieval Multi-modal RBM Cross-modal Deep Learning
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The task of cross-modal retrieval, i.e., using a text query to search for images or vice versa, has received considerable attention with the rapid growth of multi-modal web data. Modeling the correlations between different modalities is the key to tackle this problem. In this paper, we propose a correspondence restricted Boltzmann machine (Corr-RBM) to map the original features of bimodal data, such as image and text in our setting, into a low-dimensional common space, in which the heterogeneous data are comparable. In our Corr-RBM, two RBMs built for image and text, respectively are connected at their individual hidden representation layers by a correlation loss function. A single objective function is constructed to trade off the correlation loss and likelihoods of both modalities. Through the optimization of this objective function, our Corr-RBM is able to capture the correlations between two modalities and learn the representation of each modality simultaneously. Furthermore, we construct two deep neural structures using Corr-RBM as the main building block for the task of cross-modal retrieval. A number of comparison experiments are performed on three public real-world data sets. All of our models show significantly better results than state-of-the-art models in both searching images via text query and vice versa.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2014.12.020