Automatic Caption Generation via Attention Based Deep Neural Network Model
The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to genera...
Saved in:
Published in: | 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) pp. 1 - 6 |
---|---|
Main Authors: | , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
03-09-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to generate textual descriptions for an image which can be used further for visual analysis and understanding the semantics of the content. Various approaches and techniques have been proposed for this problem and in recent times deep learning based models particularly those which have incorporated attention mechanism have produced better caption generators. The attention-based models tend to visualize what is seen prominently in the image and hence, are capable of producing better captions of an image. In this work, an automatic caption generator model based on attention mechanism has been implemented and the experimental results have been discussed. The model consists of a Convolutional Neural Network (CNN) encoder along with a Gated Recurrent Unit (GRU) as a Recurrent Neural Network (RNN) decoder with a local attention module. |
---|---|
DOI: | 10.1109/ICRITO51393.2021.9596255 |