The corpus of Basque simplified textx (CBST)

In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator w...

Full description

Saved in:
Bibliographic Details
Published in:Language Resources and Evaluation Vol. 52; no. 1; pp. 217 - 247
Main Authors: Gonzalez-Dios, Itziar, Aranzabe, María Jesús, de Ilarraza, Arantza Díaz
Format: Journal Article
Language:English
Published: Springer 01-01-2018
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.
ISSN:1574-020X
1572-8412