Data Set Synthesis Based on Known Correlations and Distributions for Expanded Social Graph Generation

Nowadays, data created through the usage of different services are most commonly not available to the average researcher. Security and privacy have become a top concern, which has further restricted access to certain real-life data, especially holding true for social networks. This is why synthetic...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 8; pp. 33013 - 33022
Main Authors: Petricioli, Lucija, Humski, Luka, Vranic, Mihaela, Pintar, Damir
Format: Journal Article
Language:English
Published: Piscataway IEEE 2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Nowadays, data created through the usage of different services are most commonly not available to the average researcher. Security and privacy have become a top concern, which has further restricted access to certain real-life data, especially holding true for social networks. This is why synthetic data generators have become a very important area of research, particularly synthetic social graph generators. However, even today, such generators mostly create graphs that contain just the information whether two nodes are connected. Fortunately, there is an existing conceptual solution for an expanded social graph generator that aims to generate synthetic graphs containing multiple weighted edges between nodes, thus showing various types of relationships among those nodes, all based on known real-life data characteristics. One of its proposed steps is the generation of necessary data according to provided distributions and correlations. This paper focuses on the generation of such data by adapting an existing iterative algorithm for non-normal multivariate data simulation to generate synthetic data based on the publicly available distributions and correlations of Facebook interaction parameters. It is shown that the characteristics of the generated synthetic data are similar to the known characteristics of the real-life data, proving that the chosen algorithm, along with the accompanying alterations, can be used as one of the steps within the process of generating a synthetic expanded social graph.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.2970862