THINKER - Entity Linking System for Turkish Language

Entity linking is one of the problems to be handled in order to process natural language and to enrich the existing unstructured text with metadata. The generation of assignments between knowledge base entities and lexical units is called entity linking. Although a number of systems have been propos...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on knowledge and data engineering Vol. 30; no. 2; pp. 367 - 380
Main Authors: Kalender, Murat, Korkmaz, Emin Erkan
Format: Journal Article
Language:English
Published: New York IEEE 01-02-2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Entity linking is one of the problems to be handled in order to process natural language and to enrich the existing unstructured text with metadata. The generation of assignments between knowledge base entities and lexical units is called entity linking. Although a number of systems have been proposed for linking entity mentions in various languages, there is currently no publicly available entity linking system specific to the Turkish language. This paper presents a novel entity linking system-THINKER - for linking Turkish content with entities defined in the Turkish dictionary (tdk.gov.tr) or Turkish Wikipedia (tr.wikipedia.org). Specifically, we first propose a novel machine learning based entity detection algorithm for the Turkish language. Then, we propose a collective disambiguation algorithm which utilizes a set of metrics for the linking task and, which is optimized using a genetic algorithm. The effectiveness of THINKER is validated empirically over generated data sets. The experimental results show that THINKER outperformed the state-of-the-art cross-lingual and multilingual entity linking systems in the literature. High entity linking performance (74.81 percent F1 score) is achieved by extending previous methods with some features specific to Turkish language and by developing a novel method that can learn better representations of entity embeddings.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2017.2761743