Automating keyphrase extraction with multi-objective genetic algorithms

Keyphrases have been used extensively in IR systems to facilitate information exchange, organize information and assist information retrieval. Automation of keyphrase generation is essential for the timely creation of keyphrases for large repositories in new domains where previous thesauri do not ex...

Full description

Saved in:
Bibliographic Details
Published in:37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the p. 8 pp.
Main Authors: Jia-Long Wu, Agogino, A.M.
Format: Conference Proceeding
Language:English
Published: IEEE 2004
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Keyphrases have been used extensively in IR systems to facilitate information exchange, organize information and assist information retrieval. Automation of keyphrase generation is essential for the timely creation of keyphrases for large repositories in new domains where previous thesauri do not exist or for metacollections in which keyphrases that are meaningful across disparate collections are needed. In this paper we propose an automated keyphrase extraction algorithm using a non-dominated sorting multi-objective genetic algorithm. The "clumping" property of keyphrases is used to judge the appropriateness of a phrase and is quantified by a condensation clustering measure proposed by Bookstein. The objective is to find the smallest phrase set that has the best precision, as measured by average condensation clustering. Keyphrases were retrieved from a collection of design conference papers and the results were presented to domain experts for evaluation. Ninety percent of the generated phrases were deemed appropriate for use in a thesaurus for engineering design.
ISBN:0769520561
9780769520568
DOI:10.1109/HICSS.2004.1265278