Automatic content extraction and time-aware topic clustering for large-scale social network on cloud platform
In recent years, with the increase in users in social network, the social network has had the feature of big data. The large-scale social network has become an indispensable part in people’s life. However, the traditional data mining technology cannot suit the large-scale social network. Thus, it is...
Saved in:
Published in: | The Journal of supercomputing Vol. 75; no. 5; pp. 2890 - 2924 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
New York
Springer US
01-05-2019
Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In recent years, with the increase in users in social network, the social network has had the feature of big data. The large-scale social network has become an indispensable part in people’s life. However, the traditional data mining technology cannot suit the large-scale social network. Thus, it is urgent to develop a more suitable mining technology for the large-scale social network. In this section, a crawler model based on semantic analysis and spatial clustering is proposed firstly. Then, the content extraction model based on document object model tree is built to extract the target text information from the links fetched by the proposed crawler model. The similarities between textual information in different regions are computed to choose the important information. Moreover, a two-stage topic clustering model based on time information is presented. The time information is introduced into the similarity computation between two posts or clusters. The single-pass algorithm is improved and applied in different clustering stage to improve the clustering accuracy. Finally, the proposed algorithms are evaluated on Hadoop platform. The Hadoop platform can effectively reduce the computing time and improve the server quality of users in large-scale social network. Meanwhile, the experiments demonstrate that the proposed algorithms are suitable for the data processing in large-scale social network. |
---|---|
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/s11227-018-2704-z |