CNN-optimized text recognition with binary embeddings for Arabic expiry date recognition

Recognizing Arabic dot-matrix digits is a challenging problem due to the unique characteristics of dot-matrix fonts, such as irregular dot spacing and varying dot sizes. This paper presents an approach for recognizing Arabic digits printed in dot matrix format. The proposed model is based on convolu...

Full description

Saved in:
Bibliographic Details
Published in:Journal of Electrical Systems and Information Technology Vol. 11; no. 1; pp. 11 - 18
Main Authors: Lotfy, Mohamed, Soliman, Ghada
Format: Journal Article
Language:English
Published: Berlin/Heidelberg Springer Berlin Heidelberg 01-12-2024
Springer Nature B.V
SpringerOpen
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recognizing Arabic dot-matrix digits is a challenging problem due to the unique characteristics of dot-matrix fonts, such as irregular dot spacing and varying dot sizes. This paper presents an approach for recognizing Arabic digits printed in dot matrix format. The proposed model is based on convolutional neural networks (CNN) that take the dot matrix as input and generate embeddings that are rounded to generate binary representations of the digits. The binary embeddings are then used to perform Optical Character Recognition (OCR) on the date images. To overcome the challenge of the limited availability of dotted Arabic expiration date images, we developed a True Type Font (TTF) for generating synthetic images of Arabic dot-matrix characters. The model was trained on a synthetic dataset of 3287 images and 658 synthetic images for testing, representing realistic expiration dates from 2019 to 2027 in the format of yyyy/mm/dd and yy/mm/dd . Our model achieved an accuracy of 98 . 94% on the expiry date recognition with Arabic dot matrix format using fewer parameters and less computational resources than traditional CNN-based models. By investigating and presenting our findings comprehensively, we aim to contribute substantially to the field of OCR and pave the way for advancements in Arabic dot-matrix character recognition. Our proposed approach is not limited to Arabic dot matrix digit recognition but can be also extended to text recognition tasks, such as text classification and sentiment analysis.
ISSN:2314-7172
2314-7172
DOI:10.1186/s43067-024-00136-2