Grammar Error Correction for Less Resourceful Languages: A Case Study of Sinhala

Grammatical Error Correction (GEC) is crucial for improving the readability and comprehension of text. Although substantial advancements have been achieved in this area for widely-spoken languages such as English, the focus on the development of GEC tools for less common languages such as Sinhala ha...

Full description

Saved in:
Bibliographic Details
Published in:2023 IEEE 17th International Conference on Industrial and Information Systems (ICIIS) pp. 169 - 174
Main Authors: Jayasuriya, Pradeep, Wijesundara, Malitha, Thelijjagoda, Samantha, Kodagoda, Nuwan
Format: Conference Proceeding
Language:English
Published: IEEE 25-08-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Grammatical Error Correction (GEC) is crucial for improving the readability and comprehension of text. Although substantial advancements have been achieved in this area for widely-spoken languages such as English, the focus on the development of GEC tools for less common languages such as Sinhala has been inadequate. Sinhala is a language spoken by more than 16 million people in Sri Lanka, known for its rich morphology, and complex grammar structures that pose a challenge for Sinhala GEC systems. This paper presents a novel GEC approach that utilizes Google machine translation, cross-linguistic knowledge and rule-based techniques augmented by machine learning to analyze complex Sinhala sentences. We focus on analyzing Sinhala verb agreement rules, and object validation rules in Sinhala active voice sentences. Additionally, we address the major challenges in Sinhala GEC, such as subject and object detection and the detection of grammatical features of nouns, including animacy, gender, and number. Our findings indicate that the GEC methodology presented achieved an accuracy of 75.61 %. Additionally, the gender and number detection components produced an accuracy of 90.89% and 92.33%, respectively. These results demonstrate the effectiveness of our approach in identifying and correcting errors in complex Sinhala sentences. Our approach is particularly useful in languages with rich morphology and limited annotated data.
ISBN:9798350323627
DOI:10.1109/ICIIS58898.2023.10253578