Improving Phrase-Based SMT Using Cross-Granularity Embedding Similarity

The phrase-based statistical machine translation (PBSMT) model can be viewed as a log-linear combination of translation and language model features. Such a model typically relies on the phrase table as the main resource for bilingual knowledge, which in its most basic form consists of aligned phrase...

Full description

Saved in:
Bibliographic Details
Published in:Baltic Journal of Modern Computing Vol. 4; no. 2; p. 129
Main Authors: Passban, Peyman, Hokamp, Chris, Way, Andy, Liu, Qun
Format: Journal Article
Language:English
Published: Riga University of Latvia 01-01-2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The phrase-based statistical machine translation (PBSMT) model can be viewed as a log-linear combination of translation and language model features. Such a model typically relies on the phrase table as the main resource for bilingual knowledge, which in its most basic form consists of aligned phrases, along with four probability scores. These scores only indicate the co-occurrence of phrase pairs in the training corpus, and not necessarily their semantic relatedness. The basic phrase table is also unable to incorporate contextual information about the segments where a particular phrase tends to occur. In this paper, we define six new features which express the semantic relatedness of bilingual phrases. Our method utilizes both source and target side information to enrich the phrase table. The new features are inferred from a bilingual corpus by a neural network (NN). We evaluate our model on the English-Farsi (En-Fa) and English-Czech (En-Cz) pairs and observe considerable improvements in the all En[Lef-right arrow]Fa and En[Lef-right arrow]Cz directions.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2255-8942
2255-8950