Automatic Caption Generation via Attention Based Deep Neural Network Model
The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to genera...
Saved in:
Published in: | 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) pp. 1 - 6 |
---|---|
Main Authors: | , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
03-09-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to generate textual descriptions for an image which can be used further for visual analysis and understanding the semantics of the content. Various approaches and techniques have been proposed for this problem and in recent times deep learning based models particularly those which have incorporated attention mechanism have produced better caption generators. The attention-based models tend to visualize what is seen prominently in the image and hence, are capable of producing better captions of an image. In this work, an automatic caption generator model based on attention mechanism has been implemented and the experimental results have been discussed. The model consists of a Convolutional Neural Network (CNN) encoder along with a Gated Recurrent Unit (GRU) as a Recurrent Neural Network (RNN) decoder with a local attention module. |
---|---|
AbstractList | The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis and computer vison. Natural language descriptions of the visual content can contribute a lot in this area. Image captioning intends to generate textual descriptions for an image which can be used further for visual analysis and understanding the semantics of the content. Various approaches and techniques have been proposed for this problem and in recent times deep learning based models particularly those which have incorporated attention mechanism have produced better caption generators. The attention-based models tend to visualize what is seen prominently in the image and hence, are capable of producing better captions of an image. In this work, an automatic caption generator model based on attention mechanism has been implemented and the experimental results have been discussed. The model consists of a Convolutional Neural Network (CNN) encoder along with a Gated Recurrent Unit (GRU) as a Recurrent Neural Network (RNN) decoder with a local attention module. |
Author | Tiwari, Vasudha Bhatnagar, Charul |
Author_xml | – sequence: 1 givenname: Vasudha surname: Tiwari fullname: Tiwari, Vasudha email: vasudhatiwari1608@gmail.com organization: GLA University,Department of Computer Engineering and Applications,Mathura,India – sequence: 2 givenname: Charul surname: Bhatnagar fullname: Bhatnagar, Charul email: charul@gla.ac.in organization: GLA University,Department of Computer Engineering and Applications,Mathura,India |
BookMark | eNotj8FOwzAQRI0EB1r4Ai7-gYSsXdv4GAK0QaWVqnKutvZaskiTKHVB_D1R6em9mcNIM2HXbdcSYxyKHKCwj3W1qbdrBdLKXBQCcqusFkpdsQlorWZgxvKWvZen1B0wRccr7FPsWj6nlgY863dEXqZE7Tk945E8fyHq-YpOAzYj0k83fPGPzlNzx24CNke6v3DKPt9et9UiW67ndVUuswjwlDIUzux1MDDKXoFAFMY5IQNaUF5q6T0VwfggUVoiDYVxMwHWaR-08CSn7OF_NxLRrh_iAYff3eWe_AND1Eu6 |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ICRITO51393.2021.9596255 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 1665417021 9781665417020 9781665417037 166541703X |
EndPage | 6 |
ExternalDocumentID | 9596255 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i118t-a2c7b6f71a2cb512aa27cc23fa915d363dde0f7df3a39ee6107c4219c6df62de3 |
IEDL.DBID | RIE |
IngestDate | Thu Jun 29 18:38:05 EDT 2023 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i118t-a2c7b6f71a2cb512aa27cc23fa915d363dde0f7df3a39ee6107c4219c6df62de3 |
PageCount | 6 |
ParticipantIDs | ieee_primary_9596255 |
PublicationCentury | 2000 |
PublicationDate | 2021-Sept.-3 |
PublicationDateYYYYMMDD | 2021-09-03 |
PublicationDate_xml | – month: 09 year: 2021 text: 2021-Sept.-3 day: 03 |
PublicationDecade | 2020 |
PublicationTitle | 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) |
PublicationTitleAbbrev | ICRITO |
PublicationYear | 2021 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.8333379 |
Snippet | The ever increasing visual and multimedia data on the internet has led to the requirement of visual content understanding in the domain of multimedia analysis... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | Attention Mechanism Convolutional Neural Network Convolutional neural networks Deep learning Encoder-Decoder Gated Recurrent Unit Generators Natural languages Neural Networks Recurrent Neural Network Recurrent neural networks Semantics Visualization |
Title | Automatic Caption Generation via Attention Based Deep Neural Network Model |
URI | https://ieeexplore.ieee.org/document/9596255 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH-4nTypbOI3OXi0W5vXNslxbpPNg4pO8DbS5BUGYxva-vebpHMiePHURykEXpv3kvT3AXDtCiO5z4Ii1CqO0qRIw5yLtFHaZlwaDLy1yYt4eJOjsZfJudlxYYgogM-o58PwL9-uTe2PyvrKW8VkWQtaQsmGq_UNzolVfzp8ns4eM7ekQbfv40lv-_gv35TQNu4O_jfgIXR_-HfsaddZjmCPVh24H9TVOgissqEOE501mtEh_FxoNqiqBrzIbl1vsmxEtGFefUMv3SXAvZn3Plt24fVuPBtOoq0TQrRwG4Aq0tyIIi9F4oLCtWituTCGY6lVklnM0RWpuBS2RI2KyC2JhEldLTK5LXNuCY-hvVqv6ASY9JbpaFLLTZwWspRFqhAtSU9xRSpOoePzMN80YhfzbQrO_r59Dvs-1QF0hRfQrt5ruoTWh62vwuv5AphQkhY |
link.rule.ids | 310,311,782,786,791,792,798,27934,54767 |
linkProvider | IEEE |
linkToHtml | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFH8RPOhJDRi_7cGjg63t1vWIIAFFNIqJN9K1bwkJAaKbf79thxgTL572smRp8ra-13a_D4ArWxjRfhYYMCXDgEcZ93MuUFoqE9NUM89bG7yI8Vvau3UyOdcbLgwievAZtlzo_-WbpS7dUVlbOquYOK7BdsxFIiq21jc8J5TtYfd5OHmM7aKG2Z0fjVrrB345p_jG0d_735D70Pxh4JGnTW85gC1cNOCuUxZLL7FKuspPdVKpRvvwc6ZIpygq-CK5sd3JkB7iijj9DTW3Fw_4Js79bN6E1_7tpDsI1l4IwcxuAYpAUS2yJBeRDTLbpJWiQmvKciWj2LCE2TIV5sLkTDGJaBdFQnNbjXRi8oQaZIdQXywXeAQkdabpTHNDdcizNE8zLhkzmDqSK8PsGBouD9NVJXcxXafg5O_bl7AzmDyMpqPh-P4Udl3aPQSLnUG9eC_xHGofprzwr-oLeTOVZw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+9th+International+Conference+on+Reliability%2C+Infocom+Technologies+and+Optimization+%28Trends+and+Future+Directions%29+%28ICRITO%29&rft.atitle=Automatic+Caption+Generation+via+Attention+Based+Deep+Neural+Network+Model&rft.au=Tiwari%2C+Vasudha&rft.au=Bhatnagar%2C+Charul&rft.date=2021-09-03&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FICRITO51393.2021.9596255&rft.externalDocID=9596255 |