Improving English-to-Indian Language Neural Machine Translation Systems

Most Indian languages lack sufficient parallel data for Machine Translation (MT) training. In this study, we build English-to-Indian language Neural Machine Translation (NMT) systems using the state-of-the-art transformer architecture. In addition, we investigate the utility of back-translation and...

Full description

Saved in:
Bibliographic Details
Published in:Information (Basel) Vol. 13; no. 5; p. 245
Main Authors: Kandimalla, Akshara, Lohar, Pintu, Maji, Souvik Kumar, Way, Andy
Format: Journal Article
Language:English
Published: Basel MDPI AG 01-05-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Most Indian languages lack sufficient parallel data for Machine Translation (MT) training. In this study, we build English-to-Indian language Neural Machine Translation (NMT) systems using the state-of-the-art transformer architecture. In addition, we investigate the utility of back-translation and its effect on system performance. Our experimental evaluation reveals that the back-translation method helps to improve the BLEU scores for both English-to-Hindi and English-to-Bengali NMT systems. We also observe that back-translation is more useful in improving the quality of weaker baseline MT systems. In addition, we perform a manual evaluation of the translation outputs and observe that the BLEU metric cannot always analyse the MT quality as well as humans. Our analysis shows that MT outputs for the English–Bengali pair are actually better than that evaluated by BLEU metric.
ISSN:2078-2489
2078-2489
DOI:10.3390/info13050245