Grammar Error Correction for Less Resourceful Languages: A Case Study of Sinhala
Grammatical Error Correction (GEC) is crucial for improving the readability and comprehension of text. Although substantial advancements have been achieved in this area for widely-spoken languages such as English, the focus on the development of GEC tools for less common languages such as Sinhala ha...
Saved in:
Published in: | 2023 IEEE 17th International Conference on Industrial and Information Systems (ICIIS) pp. 169 - 174 |
---|---|
Main Authors: | , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
25-08-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Grammatical Error Correction (GEC) is crucial for improving the readability and comprehension of text. Although substantial advancements have been achieved in this area for widely-spoken languages such as English, the focus on the development of GEC tools for less common languages such as Sinhala has been inadequate. Sinhala is a language spoken by more than 16 million people in Sri Lanka, known for its rich morphology, and complex grammar structures that pose a challenge for Sinhala GEC systems. This paper presents a novel GEC approach that utilizes Google machine translation, cross-linguistic knowledge and rule-based techniques augmented by machine learning to analyze complex Sinhala sentences. We focus on analyzing Sinhala verb agreement rules, and object validation rules in Sinhala active voice sentences. Additionally, we address the major challenges in Sinhala GEC, such as subject and object detection and the detection of grammatical features of nouns, including animacy, gender, and number. Our findings indicate that the GEC methodology presented achieved an accuracy of 75.61 %. Additionally, the gender and number detection components produced an accuracy of 90.89% and 92.33%, respectively. These results demonstrate the effectiveness of our approach in identifying and correcting errors in complex Sinhala sentences. Our approach is particularly useful in languages with rich morphology and limited annotated data. |
---|---|
ISBN: | 9798350323627 |
DOI: | 10.1109/ICIIS58898.2023.10253578 |