Formalización de reglas para la detección del plural en castellano en el caso de unidades no diccionarizadas

This paper presents a formalization of rules on plural formation in Spanish to be used in the processing of specialized terminology, as it is frequently the case that terms are not found in dictionaries of general language and therefore they cannot be lemmatized or POS-tagged. The absence of terms i...

Full description

Saved in:
Bibliographic Details
Published in:Linguamática (Braga, Portugal) Vol. 11; no. 2; p. 17
Main Authors: Nazar, Rogelio, Galdames, Amparo
Format: Journal Article
Language:Spanish
Published: Braga Universidade do Minho, Departamento de Informatica 01-01-2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents a formalization of rules on plural formation in Spanish to be used in the processing of specialized terminology, as it is frequently the case that terms are not found in dictionaries of general language and therefore they cannot be lemmatized or POS-tagged. The absence of terms in general dictionaries has negative effects in tasks such as terminology extraction, particularly in the case of morphologically rich languages. We attack the problem by cascading through multiple trasnfser rules, regular expressions and lexical acquisition from large corpora. Results show significant reduction of the error rate of two POS-taggers: TreeTagger and UDPipe. We offer an open-source implementation which works as a post-process, cleaning up after the tagger.
ISSN:1647-0818
DOI:10.21814/lm.11.2.285