OCR Error Correction Using BiLSTM
Language models have critical importance in the pre- and post-processing of optical character recognition (OCR). The quality of documents and scanners is important for OCR systems, with inferior quality leading to more erroneous output. For long time intervals of sequences, long short-term memory (L...
Saved in:
Published in: | 2021 International Conference on Electrical, Computer and Energy Technologies (ICECET) pp. 1 - 5 |
---|---|
Main Authors: | , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
09-12-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Language models have critical importance in the pre- and post-processing of optical character recognition (OCR). The quality of documents and scanners is important for OCR systems, with inferior quality leading to more erroneous output. For long time intervals of sequences, long short-term memory (LSTM) fulfills the requirements because it can solve problems with long-term dependencies. In this study, we evaluate the performance of error correction for OCR data using LSTM. The results show that we have good performance for correcting error words by using bidirectional LSTM (BiLSTM). We obtain 98.13% better performance in correcting error words by using OCRd data and 97.18% better performance by using social media data. In this respect, we show that the method we have applied can be used for error corrections. |
---|---|
DOI: | 10.1109/ICECET52533.2021.9698712 |