A Clustering Method of Case-Involved News by Combining Topic Network and Multi-Head Attention Mechanism

Finding the news of same case from the large numbers of case-involved news is an important basis for public opinion analysis. Existing text clustering methods usually based on topic models which only use topic and case infomation as the global features of documents, so distinguishing between differe...

Full description

Saved in:
Bibliographic Details
Published in:Sensors (Basel, Switzerland) Vol. 21; no. 22; p. 7501
Main Authors: Mao, Cunli, Liang, Haoyuan, Yu, Zhengtao, Huang, Yuxin, Guo, Junjun
Format: Journal Article
Language:English
Published: Basel MDPI AG 11-11-2021
MDPI
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Finding the news of same case from the large numbers of case-involved news is an important basis for public opinion analysis. Existing text clustering methods usually based on topic models which only use topic and case infomation as the global features of documents, so distinguishing between different cases with similar types remains a challenge. The contents of documents contain rich local features. Taking into account the internal features of news, the information of cases and the contributions provided by different topics, we propose a clustering method of case-involved news, which combines topic network and multi-head attention mechanism. Using case information and topic information to construct a topic network, then extracting the global features by graph convolution network, thus realizing the combination of case information and topic information. At the same time, the local features are extracted by multi-head attention mechanism. Finally, the fusion of global features and local features is realized by variational auto-encoder, and the learned latent representations are used for clustering. The experiments show that the proposed method significantly outperforms the state-of-the-art unsupervised clustering methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1424-8220
1424-8220
DOI:10.3390/s21227501