Crawling Online Social Graphs

Extensive research has been conducted on top of online social networks (OSNs), while little attention has been paid to the data collection process. Due to the large scale of OSNs and their privacy control policies, a partial data set is often used for analysis. The data set analyzed is decided by ma...

Full description

Saved in:

Bibliographic Details
Published in:	2010 12th International Asia-Pacific Web Conference pp. 236 - 242
Main Authors:	Shaozhi Ye, Juan Lang, Felix Wu
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-04-2010
Subjects:	Algorithm design and analysis Computer science Crawlers Data analysis Data privacy graph sampling Large-scale systems online social networks Protection Sampling methods Social network services Web and internet services
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Extensive research has been conducted on top of online social networks (OSNs), while little attention has been paid to the data collection process. Due to the large scale of OSNs and their privacy control policies, a partial data set is often used for analysis. The data set analyzed is decided by many factors including the choice of seeds, node selection algorithms, and the sample size. These factors may introduce biases and further contaminate or even skew the results. To evaluate the impact of different factors, this paper examines the OSN graph crawling problem, where the nodes are OSN users and the edges are the links (or relationship) among these users. More specifically, by looking at various factors in the crawling process, the following problems are addressed in this paper: 1) Efficiency: How fast different crawlers discover nodes/links; 2) Sensitivity: How different OSNs and the number of protected users affect crawlers; 3) Bias: How major graph properties are skewed. To the best of our knowledge, our simulations on four real world online social graphs provide the first in-depth empirical answers to these questions.
ISBN:	9781769540122 9781424465996 1424465990
DOI:	10.1109/APWeb.2010.10