Comparison of collocation extraction measures for document indexing

Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an abundan...

Full description

Saved in:

Bibliographic Details
Published in:	28th International Conference on Information Technology Interfaces, 2006 pp. 451 - 456
Main Authors:	Petrovic, S., Snajder, J., Dalbelo-Basic, B., Kolar, M.
Format:	Conference Proceeding
Language:	English
Published:	IEEE 2006
Subjects:	Cancer Computational linguistics Data mining Guns Indexing Law Legal factors Natural language processing Statistics Stock markets
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an abundance of these measures proposed by various authors, we have compared some of them on a task of extracting collocations from a corpus of Croatian legal documents for the purpose of document indexing. We propose and evaluate extensions of these measures for collocations consisting of three words
ISBN:	9789537138059 9537138054
ISSN:	1330-1012
DOI:	10.1109/ITI.2006.1708523