Automatic Caption Generation via Attention Based Deep Neural Network Model

The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to genera...

Full description

Saved in:

Bibliographic Details
Published in:	2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) pp. 1 - 6
Main Authors:	Tiwari, Vasudha, Bhatnagar, Charul
Format:	Conference Proceeding
Language:	English
Published:	IEEE 03-09-2021
Subjects:	Attention Mechanism Convolutional Neural Network Convolutional neural networks Deep learning Encoder-Decoder Gated Recurrent Unit Generators Natural languages Neural Networks Recurrent Neural Network Recurrent neural networks Semantics Visualization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to generate textual descriptions for an image which can be used further for visual analysis and understanding the semantics of the content. Various approaches and techniques have been proposed for this problem and in recent times deep learning based models particularly those which have incorporated attention mechanism have produced better caption generators. The attention-based models tend to visualize what is seen prominently in the image and hence, are capable of producing better captions of an image. In this work, an automatic caption generator model based on attention mechanism has been implemented and the experimental results have been discussed. The model consists of a Convolutional Neural Network (CNN) encoder along with a Gated Recurrent Unit (GRU) as a Recurrent Neural Network (RNN) decoder with a local attention module.
AbstractList	The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to generate textual descriptions for an image which can be used further for visual analysis and understanding the semantics of the content. Various approaches and techniques have been proposed for this problem and in recent times deep learning based models particularly those which have incorporated attention mechanism have produced better caption generators. The attention-based models tend to visualize what is seen prominently in the image and hence, are capable of producing better captions of an image. In this work, an automatic caption generator model based on attention mechanism has been implemented and the experimental results have been discussed. The model consists of a Convolutional Neural Network (CNN) encoder along with a Gated Recurrent Unit (GRU) as a Recurrent Neural Network (RNN) decoder with a local attention module.
Author	Tiwari, Vasudha Bhatnagar, Charul
Author_xml	– sequence: 1 givenname: Vasudha surname: Tiwari fullname: Tiwari, Vasudha email: vasudhatiwari1608@gmail.com organization: GLA University,Department of Computer Engineering and Applications,Mathura,India – sequence: 2 givenname: Charul surname: Bhatnagar fullname: Bhatnagar, Charul email: charul@gla.ac.in organization: GLA University,Department of Computer Engineering and Applications,Mathura,India
BookMark	eNotj8FOwzAQRI0EB1r4Ai7-gYSsXdv4GAK0QaWVqnKutvZaskiTKHVB_D1R6em9mcNIM2HXbdcSYxyKHKCwj3W1qbdrBdLKXBQCcqusFkpdsQlorWZgxvKWvZen1B0wRccr7FPsWj6nlgY863dEXqZE7Tk945E8fyHq-YpOAzYj0k83fPGPzlNzx24CNke6v3DKPt9et9UiW67ndVUuswjwlDIUzux1MDDKXoFAFMY5IQNaUF5q6T0VwfggUVoiDYVxMwHWaR-08CSn7OF_NxLRrh_iAYff3eWe_AND1Eu6
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ICRITO51393.2021.9596255
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1665417021 9781665417020 9781665417037 166541703X
EndPage	6
ExternalDocumentID	9596255
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i118t-a2c7b6f71a2cb512aa27cc23fa915d363dde0f7df3a39ee6107c4219c6df62de3
IEDL.DBID	RIE
IngestDate	Thu Jun 29 18:38:05 EDT 2023
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i118t-a2c7b6f71a2cb512aa27cc23fa915d363dde0f7df3a39ee6107c4219c6df62de3
PageCount	6
ParticipantIDs	ieee_primary_9596255
PublicationCentury	2000
PublicationDate	2021-Sept.-3
PublicationDateYYYYMMDD	2021-09-03
PublicationDate_xml	– month: 09 year: 2021 text: 2021-Sept.-3 day: 03
PublicationDecade	2020
PublicationTitle	2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)
PublicationTitleAbbrev	ICRITO
PublicationYear	2021
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.8333379
Snippet	The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Attention Mechanism Convolutional Neural Network Convolutional neural networks Deep learning Encoder-Decoder Gated Recurrent Unit Generators Natural languages Neural Networks Recurrent Neural Network Recurrent neural networks Semantics Visualization
Title	Automatic Caption Generation via Attention Based Deep Neural Network Model
URI	https://ieeexplore.ieee.org/document/9596255
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH-4nTypbOI3OXi0W5vXNslxbpPNg4pO8DbS5BUGYxva-vebpHMiePHURykEXpv3kvT3AXDtCiO5z4Ii1CqO0qRIw5yLtFHaZlwaDLy1yYt4eJOjsZfJudlxYYgogM-o58PwL9-uTe2PyvrKW8VkWQtaQsmGq_UNzolVfzp8ns4eM7ekQbfv40lv-_gv35TQNu4O_jfgIXR_-HfsaddZjmCPVh24H9TVOgissqEOE501mtEh_FxoNqiqBrzIbl1vsmxEtGFefUMv3SXAvZn3Plt24fVuPBtOoq0TQrRwG4Aq0tyIIi9F4oLCtWituTCGY6lVklnM0RWpuBS2RI2KyC2JhEldLTK5LXNuCY-hvVqv6ASY9JbpaFLLTZwWspRFqhAtSU9xRSpOoePzMN80YhfzbQrO_r59Dvs-1QF0hRfQrt5ruoTWh62vwuv5AphQkhY
link.rule.ids	310,311,782,786,791,792,798,27934,54767
linkProvider	IEEE
linkToHtml	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFH8RPOhJDRi_7cGjg63t1vWIIAFFNIqJN9K1bwkJAaKbf79thxgTL572smRp8ra-13a_D4ArWxjRfhYYMCXDgEcZ93MuUFoqE9NUM89bG7yI8Vvau3UyOdcbLgwievAZtlzo_-WbpS7dUVlbOquYOK7BdsxFIiq21jc8J5TtYfd5OHmM7aKG2Z0fjVrrB345p_jG0d_735D70Pxh4JGnTW85gC1cNOCuUxZLL7FKuspPdVKpRvvwc6ZIpygq-CK5sd3JkB7iijj9DTW3Fw_4Js79bN6E1_7tpDsI1l4IwcxuAYpAUS2yJBeRDTLbpJWiQmvKciWj2LCE2TIV5sLkTDGJaBdFQnNbjXRi8oQaZIdQXywXeAQkdabpTHNDdcizNE8zLhkzmDqSK8PsGBouD9NVJXcxXafg5O_bl7AzmDyMpqPh-P4Udl3aPQSLnUG9eC_xHGofprzwr-oLeTOVZw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+9th+International+Conference+on+Reliability%2C+Infocom+Technologies+and+Optimization+%28Trends+and+Future+Directions%29+%28ICRITO%29&rft.atitle=Automatic+Caption+Generation+via+Attention+Based+Deep+Neural+Network+Model&rft.au=Tiwari%2C+Vasudha&rft.au=Bhatnagar%2C+Charul&rft.date=2021-09-03&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FICRITO51393.2021.9596255&rft.externalDocID=9596255