Research on the TF–IDF algorithm combined with semantics for automatic extraction of keywords from network news texts
As the number of online news texts continues to increase, the algorithm of automatic keyword extraction becomes a key content in facilitating users’ fast access to the desired content. This article first introduced two common algorithms: term frequency–inverse document frequency (TF–IDF) and TextRan...
Saved in:
Published in: | Journal of intelligent systems Vol. 33; no. 1; pp. 455 - 65 |
---|---|
Main Author: | |
Format: | Journal Article |
Language: | English |
Published: |
Berlin
De Gruyter
18-07-2024
Walter de Gruyter GmbH |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | As the number of online news texts continues to increase, the algorithm of automatic keyword extraction becomes a key content in facilitating users’ fast access to the desired content. This article first introduced two common algorithms: term frequency–inverse document frequency (TF–IDF) and TextRank. Then, the calculation of news title weight was added to the TF–IDF algorithm according to the characteristics of network news text. Moreover, a new automatic extraction algorithm was designed by applying Word2vec to extract semantics. The experimental results demonstrated that on the ACE2005 dataset, as the quantity of automatically extracted keywords increased, the accuracy of the TF–IDF, TextRank, and the semantics-combined TF–IDF algorithms gradually decreased, and the recall rates gradually increased. When five keywords were extracted, the gap of the semantics-combined TF–IDF algorithm with the other two algorithms was the largest, and its accuracy, recall rate, and
-measure were 72.77, 78.64, and 75.59%, respectively. Finally, the
-measure of the semantics-combined TF–IDF algorithm reached 81% for network news texts. The experimental results prove the performance of the semantics-combined TF–IDF algorithm in automatically extracting keywords from network news texts, and it will have promising applications in practice. |
---|---|
ISSN: | 2191-026X 0334-1860 2191-026X |
DOI: | 10.1515/jisys-2023-0300 |