Hybrid lemmatization in HuSpaCy

Lemmatization is still not a trivial task for morphologically rich languages. Previous studies showed that hybrid architectures usually work better for these languages and can yield great results. This paper presents a hybrid lemmatizer utilizing both a neural model, dictionaries and hand-crafted ru...

Full description

Saved in:
Bibliographic Details
Main Authors: Berkecz, Péter, Orosz, György, Szántó, Zsolt, Szabó, Gergő, Farkas, Richárd
Format: Journal Article
Language:English
Published: 13-06-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Lemmatization is still not a trivial task for morphologically rich languages. Previous studies showed that hybrid architectures usually work better for these languages and can yield great results. This paper presents a hybrid lemmatizer utilizing both a neural model, dictionaries and hand-crafted rules. We introduce a hybrid architecture along with empirical results on a widely used Hungarian dataset. The presented methods are published as three HuSpaCy models.
DOI:10.48550/arxiv.2306.07636