Towards automatic document classification by exploiting only knowledge resources

Document classification is critical to optimize information retrieval tasks, especially over the web. In this environment, the open domain nature and growing volume of available data remain a challenge for the classification task. In this paper, we deal with these problems by only using knowledge re...

Full description

Saved in:
Bibliographic Details
Published in:2015 34th International Conference of the Chilean Computer Science Society (SCCC) pp. 1 - 6
Main Authors: Cardoso da Silva, Gleidson Antonio, Dorneles, Carina F.
Format: Conference Proceeding
Language:English
Published: IEEE 01-11-2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Document classification is critical to optimize information retrieval tasks, especially over the web. In this environment, the open domain nature and growing volume of available data remain a challenge for the classification task. In this paper, we deal with these problems by only using knowledge resources. Our approach relies on concepts instances derived from the document and an open domain knowledge base for concept generalization. The set of broader concepts is ranked according to a disparity value, and then the best-placed concept is considered as the document class label. Experimental results on real-world datasets show that this approach can achieve document classification without the need to build an ontology or train and keep a classification model.
DOI:10.1109/SCCC.2015.7416573