Folk Games Image Captioning using Object Attention
The result of a deep learning-based image captioning system with encoder-decoder framework relies heavily on the image feature extraction technique and the caption-based model. The accuracy of the model is heavily influenced by the proposed attention mechanism. The inability to distinguish between t...
Saved in:
Published in: | Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Online) Vol. 7; no. 4; pp. 758 - 766 |
---|---|
Main Authors: | , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Ikatan Ahli Informatika Indonesia
12-08-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The result of a deep learning-based image captioning system with encoder-decoder framework relies heavily on the image feature extraction technique and the caption-based model. The accuracy of the model is heavily influenced by the proposed attention mechanism. The inability to distinguish between the output of the attention model and the input expectation of the decoder can cause the decoder to give incorrect results. In this paper, we proposed an object-attention mechanism using object detection. Object detection outputs a bounding box and an object category label, which is then used as an image input into VGG16 for feature extraction and into a caption-based LSTM model. The experimental results showed that the system with object attention performed better than the system without object attention. BLEU-1, BLEU-2, BLEU-3, BLEU-4, and CIDER scores for the image captioning system with object attention improved 12.48%, 17.39%, 24.06%, 36.37%, and 43.50% respectively compared to the system without object attention.
|
---|---|
ISSN: | 2580-0760 2580-0760 |
DOI: | 10.29207/resti.v7i4.4708 |