Graphine: Programming Graph-Parallel Computation of Large Natural Graphs for Multicore Clusters

Graph-parallel computation has become a crucial component in emerging applications of web search, data analytics and machine learning. In practice, most graphs derived from real-world phenomena are very large and scale-free. Unfortunately, distributed graph-parallel computation of these natural grap...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems Vol. 27; no. 6; pp. 1647 - 1659
Main Authors: Yan, Jie, Tan, Guangming, Mo, Zeyao, Sun, Ninghui
Format: Journal Article
Language:English
Published: New York IEEE 01-06-2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Graph-parallel computation has become a crucial component in emerging applications of web search, data analytics and machine learning. In practice, most graphs derived from real-world phenomena are very large and scale-free. Unfortunately, distributed graph-parallel computation of these natural graphs still suffers strong scalability issues on contemporary multicore clusters. To embrace the multicore architecture in distributed graph-parallel computation, we propose the framework Graphine, which features (i) A Scatter-Combine computation abstraction that is evolved from the traditional vertex-centric approach by fusing the paired scatter and gather operations, executed separately on two edge sides, into a one-sided scatter. Further coupled with active message mechanism, it potentially reduces intermediate message cost and enables fine-grained parallelism on multicore architecture. (ii) An Agent-Graph data model, which leverages an idea similar to vertex-cut but conceptually splits the remote replica into two agent types of scatter and combiner, resulting in less communication. We implement the Graphine framework and evaluate it using several representative algorithms on six large real-world graphs and a series of synthetic graphs with power-law degree distributions. We show that Graphine achieves sublinear scalability with the number of cores per node, number of nodes, and graph sizes (up to one billion vertices), and is 2~15 times faster than the state-of-the-art PowerGraph on a cluster of 16 multicore nodes.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2015.2453978