Deep correspondence restricted Boltzmann machine for cross-modal retrieval

The task of cross-modal retrieval, i.e., using a text query to search for images or vice versa, has received considerable attention with the rapid growth of multi-modal web data. Modeling the correlations between different modalities is the key to tackle this problem. In this paper, we propose a cor...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) Vol. 154; pp. 50 - 60
Main Authors: Feng, Fangxiang, Li, Ruifan, Wang, Xiaojie
Format: Journal Article
Language:English
Published: Elsevier B.V 22-04-2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The task of cross-modal retrieval, i.e., using a text query to search for images or vice versa, has received considerable attention with the rapid growth of multi-modal web data. Modeling the correlations between different modalities is the key to tackle this problem. In this paper, we propose a correspondence restricted Boltzmann machine (Corr-RBM) to map the original features of bimodal data, such as image and text in our setting, into a low-dimensional common space, in which the heterogeneous data are comparable. In our Corr-RBM, two RBMs built for image and text, respectively are connected at their individual hidden representation layers by a correlation loss function. A single objective function is constructed to trade off the correlation loss and likelihoods of both modalities. Through the optimization of this objective function, our Corr-RBM is able to capture the correlations between two modalities and learn the representation of each modality simultaneously. Furthermore, we construct two deep neural structures using Corr-RBM as the main building block for the task of cross-modal retrieval. A number of comparison experiments are performed on three public real-world data sets. All of our models show significantly better results than state-of-the-art models in both searching images via text query and vice versa.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2014.12.020