Using complex networks for text classification: Discriminating informative and imaginative documents

Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised...

Full description

Saved in:
Bibliographic Details
Published in:Europhysics letters Vol. 113; no. 2; p. 28007
Main Authors: de Arruda, Henrique F., Costa, Luciano da F., Amancio, Diego R.
Format: Journal Article
Language:English
Published: Les Ulis EDP Sciences, IOP Publishing and Società Italiana di Fisica 01-01-2016
IOP Publishing
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts.
Bibliography:publisher-ID:epl17660
ark:/67375/80W-NN2MF63S-8
istex:39EB4E5353B88D11E1CA235753128E829CA2F92B
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0295-5075
1286-4854
DOI:10.1209/0295-5075/113/28007