The African Stopwords project: curating stopwords for African languages
Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resour...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
21-03-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Stopwords are fundamental in Natural Language Processing (NLP) techniques for
information retrieval. One of the common tasks in preprocessing of text data is
the removal of stopwords. Currently, while high-resource languages like English
benefit from the availability of several stopwords, low-resource languages,
such as those found in the African continent, have none that are standardized
and available for use in NLP packages. Stopwords in the context of African
languages are understudied and can reveal information about the crossover
between languages. The \textit{African Stopwords} project aims to study and
curate stopwords for African languages. In this paper, we present our current
progress on ten African languages as well as future plans for the project. |
---|---|
DOI: | 10.48550/arxiv.2304.12155 |