Topic detection with recursive consensus clustering and semantic enrichment

Abstract Extracting meaningful information from short texts like tweets has proved to be a challenging task. Literature on topic detection focuses mostly on methods that try to guess the plausible words that describe topics whose number has been decided in advance. Topics change according to the ini...

Full description

Saved in:
Bibliographic Details
Published in:Humanities & social sciences communications Vol. 10; no. 1; pp. 197 - 10
Main Authors: De Leo, Vincenzo, Puliga, Michelangelo, Bardazzi, Marco, Capriotti, Filippo, Filetti, Andrea, Chessa, Alessandro
Format: Journal Article
Language:English
Published: London Palgrave Macmillan 01-12-2023
Springer Nature
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Extracting meaningful information from short texts like tweets has proved to be a challenging task. Literature on topic detection focuses mostly on methods that try to guess the plausible words that describe topics whose number has been decided in advance. Topics change according to the initial setup of the algorithms and show a consistent instability with words moving from one topic to another one. In this paper we propose an iterative procedure for topic detection that searches for the most stable solutions in terms of words describing a topic. We use an iterative procedure based on clustering on the consensus matrix, and traditional topic detection, to find both a stable set of words and an optimal number of topics. We observe however that in several cases the procedure does not converge to a unique value but oscillates. We further enhance the methodology using semantic enrichment via Word Embedding with the aim of reducing noise and improving topic separation. We foresee the application of this set of techniques in an automatic topic discovery in noisy channels such as Twitter or social media.
ISSN:2662-9992
2662-9992
DOI:10.1057/s41599-023-01711-0