Innovatively Fused Deep Learning with Limited Noisy Data for Evaluating Translations from Poor into Rich Morphology
Evaluation of machine translation (MT) into morphologically rich languages has not been well studied despite its importance. This paper proposes a classifier, that is, a deep learning (DL) schema for MT evaluation, based on different categories of information (linguistic features, natural language p...
Saved in:
Published in: | Applied sciences Vol. 11; no. 2; p. 639 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
Basel
MDPI AG
01-01-2021
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Evaluation of machine translation (MT) into morphologically rich languages has not been well studied despite its importance. This paper proposes a classifier, that is, a deep learning (DL) schema for MT evaluation, based on different categories of information (linguistic features, natural language processing (NLP) metrics and embeddings), by using a model for machine learning based on noisy and small datasets. The linguistic features are string based for the language pairs English (EN)–Greek (EL) and EN–Italian (IT). The paper also explores the linguistic differences that affect evaluation accuracy between different kinds of corpora. A comparative study between using a simple embedding layer (mathematically calculated) and pre-trained embeddings is conducted. Moreover, an analysis of the impact of feature selection and dimensionality reduction on classification accuracy has been conducted. Results show that using a neural network (NN) model with different input representations produces results that clearly outperform the state-of-the-art for MT evaluation for EN–EL and EN–IT, by an increase of almost 0.40 points in correlation with human judgments on pairwise MT evaluation. It is observed that the proposed algorithm achieved better results on noisy and small datasets. In addition, for a more integrated analysis of the accuracy results, a qualitative linguistic analysis has been carried out in order to address complex linguistic phenomena. |
---|---|
ISSN: | 2076-3417 2076-3417 |
DOI: | 10.3390/app11020639 |