Folk Games Image Captioning using Object Attention

The result of a deep learning-based image captioning system with encoder-decoder framework relies heavily on the image feature extraction technique and the caption-based model. The accuracy of the model is heavily influenced by the proposed attention mechanism. The inability to distinguish between t...

Full description

Saved in:
Bibliographic Details
Published in:Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Online) Vol. 7; no. 4; pp. 758 - 766
Main Authors: Akbar, Saiful, Sitohang, Benhard, Pardede, Jasman, Amal, Irfan, Yunastrian, Kurnianda, Ahmada, Marsa, Prameswari, Anindya
Format: Journal Article
Language:English
Published: Ikatan Ahli Informatika Indonesia 12-08-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The result of a deep learning-based image captioning system with encoder-decoder framework relies heavily on the image feature extraction technique and the caption-based model. The accuracy of the model is heavily influenced by the proposed attention mechanism. The inability to distinguish between the output of the attention model and the input expectation of the decoder can cause the decoder to give incorrect results. In this paper, we proposed an object-attention mechanism using object detection. Object detection outputs a bounding box and an object category label, which is then used as an image input into VGG16 for feature extraction and into a caption-based LSTM model. The experimental results showed that the system with object attention performed better than the system without object attention. BLEU-1, BLEU-2, BLEU-3, BLEU-4, and CIDER scores for the image captioning system with object attention improved 12.48%, 17.39%, 24.06%, 36.37%, and 43.50% respectively compared to the system without object attention.  
ISSN:2580-0760
2580-0760
DOI:10.29207/resti.v7i4.4708