Statistical Features-Based Real-Time Detection of Drifted Twitter Spam

Twitter spam has become a critical problem nowadays. Recent works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. In our labeled tweets data set, however, we observe that the statistical properties of spam tweets vary ov...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on information forensics and security Vol. 12; no. 4; pp. 914 - 925
Main Authors: Chen, Chao, Wang, Yu, Zhang, Jun, Xiang, Yang, Zhou, Wanlei, Min, Geyong
Format: Journal Article
Language:English
Published: IEEE 01-04-2017
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Twitter spam has become a critical problem nowadays. Recent works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. In our labeled tweets data set, however, we observe that the statistical properties of spam tweets vary over time, and thus, the performance of existing machine learning-based classifiers decreases. This issue is referred to as "Twitter Spam Drift". In order to tackle this problem, we first carry out a deep analysis on the statistical features of one million spam tweets and one million non-spam tweets, and then propose a novel Lfun scheme. The proposed scheme can discover "changed" spam tweets from unlabeled tweets and incorporate them into classifier's training process. A number of experiments are performed to evaluate the proposed scheme. The results show that our proposed Lfun scheme can significantly improve the spam detection accuracy in real-world scenarios.
ISSN:1556-6013
1556-6021
DOI:10.1109/TIFS.2016.2621888