Minimally Supervised Relation Identification from Wikipedia Articles
Wikipedia is composed of millions of articles, each of which explains a particular entity with various languages in the real world. Since the articles are contributed and edited by a large population of diverse experts with no specific authority, Wikipedia can be seen as a naturally occurring body o...
Saved in:
Published in: | Journal of information science theory and practice Vol. 6; no. 4; pp. 28 - 38 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
Daejeon
Korea Institute of Science and Technology Information
01-12-2018
한국과학기술정보연구원 |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Wikipedia is composed of millions of articles, each of which explains a particular entity with various languages in the real world. Since the articles are contributed and edited by a large population of diverse experts with no specific authority, Wikipedia can be seen as a naturally occurring body of human knowledge. In this paper, we propose a method to automatically identify key entities and relations in Wikipedia articles, which can be used for automatic ontology construction. Compared to previous approaches to entity and relation extraction and/or identification from text, our goal is to capture naturally occurring entities and relations from Wikipedia while minimizing artificiality often introduced at the stages of constructing training and testing data. The titles of the articles and anchored phrases in their text are regarded as entities, and their types are automatically classified with minimal training. We attempt to automatically detect and identify possible relations among the entities based on clustering without training data, as opposed to the relation extraction approach that focuses on improvement of accuracy in selecting one of the several target relations for a given pair of entities. While the relation extraction approach with supervised learning requires a significant amount of annotation efforts for a predefined set of relations, our approach attempts to discover relations as they occur naturally. Unlike other unsupervised relation identification work where evaluation of automatically identified relations is done with the correct relations determined a priori by human judges, we attempted to evaluate appropriateness of the naturally occurring clusters of relations involving person-artifact and person-organization entities and their relation names. |
---|---|
Bibliography: | KISTI1.1003/JNL.JAKO201810760747196 https://www.koreascience.kr/article/JAKO201810760747196.pdf |
ISSN: | 2287-9099 2287-4577 |
DOI: | 10.1633/JISTaP.2018.6.4.3 |