Dealing with Data Sparseness in SMT with Factored Models and Morphological Expansion: a Case Study on Croatian

This paper describes our experience using available linguistic resources for Croatian in order to address data sparseness when building an English-to-Croatian general domain phrase-based statistical machine translation system. We report the results obtained with factored translation models and morph...

Full description

Saved in:
Bibliographic Details
Published in:Baltic Journal of Modern Computing Vol. 4; no. 2; p. 354
Main Authors: Sánchez-Cartagena, Víctor M, Ljubesic, Nikola, Klubicka, Filip
Format: Journal Article
Language:English
Published: Riga University of Latvia 01-01-2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper describes our experience using available linguistic resources for Croatian in order to address data sparseness when building an English-to-Croatian general domain phrase-based statistical machine translation system. We report the results obtained with factored translation models and morphological expansion, highlight the impact of the algorithm used for tagging the corpora, and show that the improvement brought by these methods is compatible with the application of data selection on out-of-domain parallel corpora.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2255-8942
2255-8950