TopicSketch: Real-Time Bursty Topic Detection from Twitter

Twitter has become one of the largest microblogging platforms for users around the world to share anything happening around them with friends and beyond. A bursty topic in Twitter is one that triggers a surge of relevant tweets within a short period of time, which often reflects important events of...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on knowledge and data engineering Vol. 28; no. 8; pp. 2216 - 2229
Main Authors: Xie, Wei, Zhu, Feida, Jiang, Jing, Lim, Ee-Peng, Wang, Ke
Format: Journal Article
Language:English
Published: New York IEEE 01-08-2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Twitter has become one of the largest microblogging platforms for users around the world to share anything happening around them with friends and beyond. A bursty topic in Twitter is one that triggers a surge of relevant tweets within a short period of time, which often reflects important events of mass interest. How to leverage Twitter for early detection of bursty topics has therefore become an important research problem with immense practical value. Despite the wealth of research work on topic modelling and analysis in Twitter, it remains a challenge to detect bursty topics in real-time. As existing methods can hardly scale to handle the task with the tweet stream in real-time, we propose in this paper <inline-formula><tex-math notation="LaTeX">\sf {TopicSketch}</tex-math> <inline-graphic xlink:type="simple" xlink:href="xie-ieq1-2556661.gif"/> </inline-formula>, a sketch-based topic model together with a set of techniques to achieve real-time detection. We evaluate our solution on a tweet stream with over 30 million tweets. Our experiment results show both efficiency and effectiveness of our approach. Especially it is also demonstrated that <inline-formula><tex-math notation="LaTeX">\sf {TopicSketch}</tex-math> <inline-graphic xlink:type="simple" xlink:href="xie-ieq2-2556661.gif"/> </inline-formula> on a single machine can potentially handle hundreds of millions tweets per day, which is on the same scale of the total number of daily tweets in Twitter, and present bursty events in finer-granularity.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2016.2556661