Bringing the old writings closer to us: Deep learning and symbolic methods in deciphering old Cyrillic Romanian documents

The paper addresses the problem of transliteration of scanned copies of old Romanian books written in the Cyrillic script into the Latin script. The motivation of this endeavor and attendees of such a technology are enumerated. Then, a number of peculiarities of these documents, which create difficu...

Full description

Saved in:
Bibliographic Details
Published in:Memoirs of the Scientific Sections of the Romanian Academy Vol. XLVI; pp. 87 - 125
Main Authors: Dan Cristea, Nicolae Cleju, Petru Rebeja, Gabriela Haja, Eduard Coman, Anca Vasilescu, Claudiu Marinescu, Andreea Dascălu
Format: Journal Article
Language:English
Published: Publishing House of the Romanian Academy 01-11-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The paper addresses the problem of transliteration of scanned copies of old Romanian books written in the Cyrillic script into the Latin script. The motivation of this endeavor and attendees of such a technology are enumerated. Then, a number of peculiarities of these documents, which create difficulties for automatic processing, are exemplified. The proposed technology is presented in the form of a pipeline of modules, each applying AI or symbolic methods. Then, the component parts are discussed individually, and solutions are presented. The research is presented as work in progress, which leaves space for further enhancements. The data supporting training and evaluation of the modules is rooted in the former DeLORo project.
ISSN:1224-1407
2343-7049