A massively parallel corpus: the Bible in 100 languages

We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysi...

Full description

Saved in:
Bibliographic Details
Published in:Language Resources and Evaluation Vol. 49; no. 2; pp. 375 - 395
Main Authors: Christodouloupoulos, Christos, Steedman, Mark
Format: Journal Article
Language:English
Published: Dordrecht Springer 01-06-2015
Springer Netherlands
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysis of the corpora collected and a detailed comparison between the English translation and other English corpora.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1574-020X
1572-8412
1574-0218
DOI:10.1007/s10579-014-9287-y