File Text Recognition and Management System Based on Tesseract-OCR
Through the research of image preprocessing technology, this paper designs and implements a web archive file recognition management system based on open source Tesseract character recognition technology. The system first preprocesses the image with grayscale and binarization. Secondly, in order to i...
Saved in:
Published in: | 2021 3rd International Conference on Applied Machine Learning (ICAML) pp. 236 - 239 |
---|---|
Main Authors: | , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-07-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Through the research of image preprocessing technology, this paper designs and implements a web archive file recognition management system based on open source Tesseract character recognition technology. The system first preprocesses the image with grayscale and binarization. Secondly, in order to improve the recognition accuracy of handwritten content, we trained the text recognition library of Tesseract. Finally, the characters are recognized and stored for later use. Archivists can use this system to convert paper documents into electronic documents, which can significantly improve the management level and digital efficiency of the file system. |
---|---|
DOI: | 10.1109/ICAML54311.2021.00057 |