Image Description Generator using Residual Neural Network and Long Short-Term Memory

Human beings can describe scenarios and objects in a picture through vision easily whereas performing the same task with a computer is a complicated one. Generating captions for the objects of an image helps everyone to understand the scenario of the image in a better way. Instinctively describing t...

Full description

Saved in:

Bibliographic Details
Published in:	Computer science journal of Moldova Vol. 31; no. 1(91); pp. 3 - 21
Main Authors:	Morampudi, Mahesh Kumar, Gonthina, Nagamani, Bhaskar, Nuthanakanti, Reddy, V. Dinesh
Format:	Journal Article
Language:	English
Published:	Vladimir Andrunachievici Institute of Mathematics and Computer Science 01-04-2023
Subjects:	description generator lstm resnet
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Human beings can describe scenarios and objects in a picture through vision easily whereas performing the same task with a computer is a complicated one. Generating captions for the objects of an image helps everyone to understand the scenario of the image in a better way. Instinctively describing the content of an image requires the apprehension of computer vision as well as natural language processing. This task has gained huge popularity in the field of technology and there is a lot of research work being carried out. Recent works have been successful in identifying objects in the image but are facing many challenges in generating captions to the given image accurately by understanding the scenario. To address this challenge, we propose a model to generate the caption for an image. Residual Neural Network (ResNet) is used to extract the features from an image. These features are converted into a vector of size 2048. The caption generation for the image is obtained with Long Short-Term Memory (LSTM). The proposed model is experimented on the Flickr8K dataset and obtained an accuracy of 88.4\%. The experimental results indicate that our model produces appropriate captions compared to the state of art models.
ISSN:	1561-4042 2587-4330
DOI:	10.56415/csjm.v31.01