Bridging Linguistic Gaps: Developing a Greek Text Simplification Dataset

Text simplification is crucial in bridging the comprehension gap in today’s information-rich environment. Despite advancements in English text simplification, languages with intricate grammatical structures, such as Greek, often remain under-explored. The complexity of Greek grammar, characterized b...

Full description

Saved in:

Bibliographic Details
Published in:	Information (Basel) Vol. 15; no. 8; p. 500
Main Authors:	Agathos, Leonidas, Avgoustis, Andreas, Kryelesi, Xristiana, Makridou, Aikaterini, Tzanis, Ilias, Mouratidis, Despoina, Katia Lida Kermanidis, Kanavos, Andreas
Format:	Journal Article
Language:	English
Published:	Basel MDPI AG 01-08-2024
Subjects:	Annotations Cognitive ability Complexity Comprehension Cultural differences dataset creation Datasets Greek language Greek textsimplification Handicapped accessibility Language language barriers linguistic accessibility Linguistics Literacy Machine learning Machine translation Multilingualism Natural language processing Quality control R&D Readability Research & development Robust control Semantics Simplification Software Syntactic islands Syntax text complexity Tourism
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Text simplification is crucial in bridging the comprehension gap in today’s information-rich environment. Despite advancements in English text simplification, languages with intricate grammatical structures, such as Greek, often remain under-explored. The complexity of Greek grammar, characterized by its flexible syntactic ordering, presents unique challenges that hinder comprehension for native speakers, learners, tourists, and international students. This paper introduces a comprehensive dataset for Greek text simplification, containing over 7500 sentences across diverse topics such as history, science, and culture, tailored to address these challenges. We outline the methodology for compiling this dataset, including a collection of texts from Greek Wikipedia, their annotation with simplified versions, and the establishment of robust evaluation metrics. Additionally, the paper details the implementation of quality control measures and the application of machine learning techniques to analyze text complexity. Our experimental results demonstrate the dataset’s initial effectiveness and potential in reducing linguistic barriers and enhancing communication, with initial machine learning models showing promising directions for future improvements in classifying text complexity. The development of this dataset marks a significant step toward improving accessibility and comprehension for a broad audience of Greek speakers and learners, fostering a more inclusive society.
ISSN:	2078-2489
DOI:	10.3390/info15080500