OCR Error Correction Using BiLSTM

Language models have critical importance in the pre- and post-processing of optical character recognition (OCR). The quality of documents and scanners is important for OCR systems, with inferior quality leading to more erroneous output. For long time intervals of sequences, long short-term memory (L...

Full description

Saved in:
Bibliographic Details
Published in:2021 International Conference on Electrical, Computer and Energy Technologies (ICECET) pp. 1 - 5
Main Authors: Kayabas, Ayla, Topcu, Ahmet E., Kilic, Ozkan
Format: Conference Proceeding
Language:English
Published: IEEE 09-12-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Language models have critical importance in the pre- and post-processing of optical character recognition (OCR). The quality of documents and scanners is important for OCR systems, with inferior quality leading to more erroneous output. For long time intervals of sequences, long short-term memory (LSTM) fulfills the requirements because it can solve problems with long-term dependencies. In this study, we evaluate the performance of error correction for OCR data using LSTM. The results show that we have good performance for correcting error words by using bidirectional LSTM (BiLSTM). We obtain 98.13% better performance in correcting error words by using OCRd data and 97.18% better performance by using social media data. In this respect, we show that the method we have applied can be used for error corrections.
DOI:10.1109/ICECET52533.2021.9698712