Evaluating keyword selection methods for WEBSOM text archives

The WEBSOM methodology, proven effective for building very large text archives, includes a method that extracts labels for each document cluster assigned to nodes in the map. However, the WEBSOM method needs to retrieve all the words of all the documents associated to each node. Since maps may have...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on knowledge and data engineering Vol. 16; no. 3; pp. 380 - 383
Main Authors:	Azcarraga, A.P., Yap TN, Jr, Tan, J., Chua, T.S.
Format:	Journal Article
Language:	English
Published:	New York IEEE 01-03-2004 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Archives Classification algorithms Clustering algorithms Computer Society Frequency Labels Mathematical analysis Methodology Navigation News Text categorization Texts Vectors (mathematics) Weight reduction
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The WEBSOM methodology, proven effective for building very large text archives, includes a method that extracts labels for each document cluster assigned to nodes in the map. However, the WEBSOM method needs to retrieve all the words of all the documents associated to each node. Since maps may have more than 100,000 nodes and since the archive may contain up to seven million documents, the WEBSOM methodology needs a faster alternative method for keyword selection. Presented here is such an alternative method that is able to quickly deduce meaningful labels per node in the map. It does this just by analyzing the relative weight distribution of the SOM weight vectors and by taking advantage of some characteristics of the random projection method used in dimensionality reduction. The effectiveness of this technique is demonstrated on news document collections.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2003.1262193