A Study of Three Statistical Machine Translation Methods for Myanmar (Burmese) and Shan (Tai Long) Language Pair

Shan is said to be the second-largest ethnic group of Myanmar. The main motivation is to break down the communication barrier between Shan people and Myanmar people. This paper contributes to the first evaluation of the quality of machine translation between Myanmar (Burmese) and Shan (Tai Long). We...

Full description

Saved in:
Bibliographic Details
Published in:2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP) pp. 1 - 6
Main Authors: Kyaw, Nang Aeindray, Thu, Ye Kyaw, Nwe, Hlaing Myat, Tar, Phyu Phyu, Min, Nandar Win, Supnithi, Thepchai
Format: Conference Proceeding
Language:English
Published: IEEE 18-11-2020
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Shan is said to be the second-largest ethnic group of Myanmar. The main motivation is to break down the communication barrier between Shan people and Myanmar people. This paper contributes to the first evaluation of the quality of machine translation between Myanmar (Burmese) and Shan (Tai Long). We also built a Myanmar-Shan parallel corpus (around 11K sentences) based on the Myanmar language of the ASEAN MT corpus. In this research, three different statistical machine translation approaches were used to carry out the experiment: phrase-based, hierarchical phrase-based, and the operation sequence model. Furthermore, two different segmentation schemes were studied, these were syllable segmentation and word segmentation. Translating with syllable segmentation achieved higher quality machine translation for both Myanmar and Shan languages. BLEU and RIBES scoring techniques are used to measure the performance of the machine translations. The operation sequence model gave the highest scores (41.85 BLEU and 0.88031 RIBES) for Shan to Myanmar syllable translation. For Myanmar to Shan syllable translation, hierarchical phrase-based machine translation gave the highest BLEU score of 34.72 and the operation sequence model gave the highest RIBES score of 0.87012. Our experimental results with syllable segmentation produced promising results even with low data resources and we expect this can be developed into a useful translation system as more data comes available in the future.
DOI:10.1109/iSAI-NLP51646.2020.9376832