Towards automatic document classification by exploiting only knowledge resources
Document classification is critical to optimize information retrieval tasks, especially over the web. In this environment, the open domain nature and growing volume of available data remain a challenge for the classification task. In this paper, we deal with these problems by only using knowledge re...
Saved in:
Published in: | 2015 34th International Conference of the Chilean Computer Science Society (SCCC) pp. 1 - 6 |
---|---|
Main Authors: | , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-11-2015
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Document classification is critical to optimize information retrieval tasks, especially over the web. In this environment, the open domain nature and growing volume of available data remain a challenge for the classification task. In this paper, we deal with these problems by only using knowledge resources. Our approach relies on concepts instances derived from the document and an open domain knowledge base for concept generalization. The set of broader concepts is ranked according to a disparity value, and then the best-placed concept is considered as the document class label. Experimental results on real-world datasets show that this approach can achieve document classification without the need to build an ontology or train and keep a classification model. |
---|---|
DOI: | 10.1109/SCCC.2015.7416573 |