MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages
In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using condi...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
23-05-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper, we present MasakhaPOS, the largest part-of-speech (POS)
dataset for 20 typologically diverse African languages. We discuss the
challenges in annotating POS for these languages using the UD (universal
dependencies) guidelines. We conducted extensive POS baseline experiments using
conditional random field and several multilingual pre-trained language models.
We applied various cross-lingual transfer models trained with data available in
UD. Evaluating on the MasakhaPOS dataset, we show that choosing the best
transfer language(s) in both single-source and multi-source setups greatly
improves the POS tagging performance of the target languages, in particular
when combined with cross-lingual parameter-efficient fine-tuning methods.
Crucially, transferring knowledge from a language that matches the language
family and morphosyntactic properties seems more effective for POS tagging in
unseen languages. |
---|---|
DOI: | 10.48550/arxiv.2305.13989 |