Social media and NLP tasks: Challenges in crowdsourcing linguistic information

In the framework of the TraMOOC1(Translation for Massive Open Online Courses) research and innovation project, data collection tasks for parallel translation are implemented using a crowdsourcing platform. The educational genre (videolectures subtitles, forums discussions, course assignments), the t...

Full description

Saved in:

Bibliographic Details
Published in:	2016 11th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP) pp. 53 - 58
Main Authors:	Takoulidou, Eirini, Sosoni, Vilelmini, Kermanidis, Katia, van Zaanen, Menno
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-10-2016
Subjects:	CrowdFlower Crowdsourcing Gold Machine Translation MOOC NLP Pragmatics Quality control Social network services Syntactics
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In the framework of the TraMOOC1(Translation for Massive Open Online Courses) research and innovation project, data collection tasks for parallel translation are implemented using a crowdsourcing platform. The educational genre (videolectures subtitles, forums discussions, course assignments), the type of text (segmentation, misspellings, syntax errors, specialized terminology, scientific formulas, limited knowledge on context) of the source data, and the multilingual approach of the involved activities (the focus is on a total of 12 European and BRIC languages) provides a challenging setting for the success of the project. Experimental trials reveal significant findings for the purposes of Language Technology research as well as limitations in crowdsourcing linguistic data collections for multilingual tasks.
ISBN:	9781509052455 1509052453
DOI:	10.1109/SMAP.2016.7753384