Lemmatization of Historical Old Literary Finnish Texts in Modern Orthography
Texts written in Old Literary Finnish represent the first literary work ever written in Finnish starting from the 16th century. There have been several projects in Finland that have digitized old publications and made them available for research use. However, using modern NLP methods in such data po...
Saved in:
Main Authors: | , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
07-07-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Texts written in Old Literary Finnish represent the first literary work ever
written in Finnish starting from the 16th century. There have been several
projects in Finland that have digitized old publications and made them
available for research use. However, using modern NLP methods in such data
poses great challenges. In this paper we propose an approach for simultaneously
normalizing and lemmatizing Old Literary Finnish into modern spelling. Our best
model reaches to 96.3\% accuracy in texts written by Agricola and 87.7\%
accuracy in other contemporary out-of-domain text. Our method has been made
freely available on Zenodo and Github. |
---|---|
DOI: | 10.48550/arxiv.2107.03266 |