Automatic Caption Generation via Attention Based Deep Neural Network Model

The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to genera...

Full description

Saved in:
Bibliographic Details
Published in:2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) pp. 1 - 6
Main Authors: Tiwari, Vasudha, Bhatnagar, Charul
Format: Conference Proceeding
Language:English
Published: IEEE 03-09-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to generate textual descriptions for an image which can be used further for visual analysis and understanding the semantics of the content. Various approaches and techniques have been proposed for this problem and in recent times deep learning based models particularly those which have incorporated attention mechanism have produced better caption generators. The attention-based models tend to visualize what is seen prominently in the image and hence, are capable of producing better captions of an image. In this work, an automatic caption generator model based on attention mechanism has been implemented and the experimental results have been discussed. The model consists of a Convolutional Neural Network (CNN) encoder along with a Gated Recurrent Unit (GRU) as a Recurrent Neural Network (RNN) decoder with a local attention module.
AbstractList The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to generate textual descriptions for an image which can be used further for visual analysis and understanding the semantics of the content. Various approaches and techniques have been proposed for this problem and in recent times deep learning based models particularly those which have incorporated attention mechanism have produced better caption generators. The attention-based models tend to visualize what is seen prominently in the image and hence, are capable of producing better captions of an image. In this work, an automatic caption generator model based on attention mechanism has been implemented and the experimental results have been discussed. The model consists of a Convolutional Neural Network (CNN) encoder along with a Gated Recurrent Unit (GRU) as a Recurrent Neural Network (RNN) decoder with a local attention module.
Author Tiwari, Vasudha
Bhatnagar, Charul
Author_xml – sequence: 1
  givenname: Vasudha
  surname: Tiwari
  fullname: Tiwari, Vasudha
  email: vasudhatiwari1608@gmail.com
  organization: GLA University,Department of Computer Engineering and Applications,Mathura,India
– sequence: 2
  givenname: Charul
  surname: Bhatnagar
  fullname: Bhatnagar, Charul
  email: charul@gla.ac.in
  organization: GLA University,Department of Computer Engineering and Applications,Mathura,India
BookMark eNotj8FOwzAQRI0EB1r4Ai7-gYSsXdv4GAK0QaWVqnKutvZaskiTKHVB_D1R6em9mcNIM2HXbdcSYxyKHKCwj3W1qbdrBdLKXBQCcqusFkpdsQlorWZgxvKWvZen1B0wRccr7FPsWj6nlgY863dEXqZE7Tk945E8fyHq-YpOAzYj0k83fPGPzlNzx24CNke6v3DKPt9et9UiW67ndVUuswjwlDIUzux1MDDKXoFAFMY5IQNaUF5q6T0VwfggUVoiDYVxMwHWaR-08CSn7OF_NxLRrh_iAYff3eWe_AND1Eu6
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICRITO51393.2021.9596255
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1665417021
9781665417020
9781665417037
166541703X
EndPage 6
ExternalDocumentID 9596255
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i118t-a2c7b6f71a2cb512aa27cc23fa915d363dde0f7df3a39ee6107c4219c6df62de3
IEDL.DBID RIE
IngestDate Thu Jun 29 18:38:05 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i118t-a2c7b6f71a2cb512aa27cc23fa915d363dde0f7df3a39ee6107c4219c6df62de3
PageCount 6
ParticipantIDs ieee_primary_9596255
PublicationCentury 2000
PublicationDate 2021-Sept.-3
PublicationDateYYYYMMDD 2021-09-03
PublicationDate_xml – month: 09
  year: 2021
  text: 2021-Sept.-3
  day: 03
PublicationDecade 2020
PublicationTitle 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)
PublicationTitleAbbrev ICRITO
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8333379
Snippet The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Attention Mechanism
Convolutional Neural Network
Convolutional neural networks
Deep learning
Encoder-Decoder
Gated Recurrent Unit
Generators
Natural languages
Neural Networks
Recurrent Neural Network
Recurrent neural networks
Semantics
Visualization
Title Automatic Caption Generation via Attention Based Deep Neural Network Model
URI https://ieeexplore.ieee.org/document/9596255
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH-4nTypbOI3OXi0W5vXNslxbpPNg4pO8DbS5BUGYxva-vebpHMiePHURykEXpv3kvT3AXDtCiO5z4Ii1CqO0qRIw5yLtFHaZlwaDLy1yYt4eJOjsZfJudlxYYgogM-o58PwL9-uTe2PyvrKW8VkWQtaQsmGq_UNzolVfzp8ns4eM7ekQbfv40lv-_gv35TQNu4O_jfgIXR_-HfsaddZjmCPVh24H9TVOgissqEOE501mtEh_FxoNqiqBrzIbl1vsmxEtGFefUMv3SXAvZn3Plt24fVuPBtOoq0TQrRwG4Aq0tyIIi9F4oLCtWituTCGY6lVklnM0RWpuBS2RI2KyC2JhEldLTK5LXNuCY-hvVqv6ASY9JbpaFLLTZwWspRFqhAtSU9xRSpOoePzMN80YhfzbQrO_r59Dvs-1QF0hRfQrt5ruoTWh62vwuv5AphQkhY
link.rule.ids 310,311,782,786,791,792,798,27934,54767
linkProvider IEEE
linkToHtml http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFH8RPOhJDRi_7cGjg63t1vWIIAFFNIqJN9K1bwkJAaKbf79thxgTL572smRp8ra-13a_D4ArWxjRfhYYMCXDgEcZ93MuUFoqE9NUM89bG7yI8Vvau3UyOdcbLgwievAZtlzo_-WbpS7dUVlbOquYOK7BdsxFIiq21jc8J5TtYfd5OHmM7aKG2Z0fjVrrB345p_jG0d_735D70Pxh4JGnTW85gC1cNOCuUxZLL7FKuspPdVKpRvvwc6ZIpygq-CK5sd3JkB7iijj9DTW3Fw_4Js79bN6E1_7tpDsI1l4IwcxuAYpAUS2yJBeRDTLbpJWiQmvKciWj2LCE2TIV5sLkTDGJaBdFQnNbjXRi8oQaZIdQXywXeAQkdabpTHNDdcizNE8zLhkzmDqSK8PsGBouD9NVJXcxXafg5O_bl7AzmDyMpqPh-P4Udl3aPQSLnUG9eC_xHGofprzwr-oLeTOVZw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+9th+International+Conference+on+Reliability%2C+Infocom+Technologies+and+Optimization+%28Trends+and+Future+Directions%29+%28ICRITO%29&rft.atitle=Automatic+Caption+Generation+via+Attention+Based+Deep+Neural+Network+Model&rft.au=Tiwari%2C+Vasudha&rft.au=Bhatnagar%2C+Charul&rft.date=2021-09-03&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FICRITO51393.2021.9596255&rft.externalDocID=9596255