Graphine: Programming Graph-Parallel Computation of Large Natural Graphs for Multicore Clusters
Graph-parallel computation has become a crucial component in emerging applications of web search, data analytics and machine learning. In practice, most graphs derived from real-world phenomena are very large and scale-free. Unfortunately, distributed graph-parallel computation of these natural grap...
Saved in:
Published in: | IEEE transactions on parallel and distributed systems Vol. 27; no. 6; pp. 1647 - 1659 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
New York
IEEE
01-06-2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Graph-parallel computation has become a crucial component in emerging applications of web search, data analytics and machine learning. In practice, most graphs derived from real-world phenomena are very large and scale-free. Unfortunately, distributed graph-parallel computation of these natural graphs still suffers strong scalability issues on contemporary multicore clusters. To embrace the multicore architecture in distributed graph-parallel computation, we propose the framework Graphine, which features (i) A Scatter-Combine computation abstraction that is evolved from the traditional vertex-centric approach by fusing the paired scatter and gather operations, executed separately on two edge sides, into a one-sided scatter. Further coupled with active message mechanism, it potentially reduces intermediate message cost and enables fine-grained parallelism on multicore architecture. (ii) An Agent-Graph data model, which leverages an idea similar to vertex-cut but conceptually splits the remote replica into two agent types of scatter and combiner, resulting in less communication. We implement the Graphine framework and evaluate it using several representative algorithms on six large real-world graphs and a series of synthetic graphs with power-law degree distributions. We show that Graphine achieves sublinear scalability with the number of cores per node, number of nodes, and graph sizes (up to one billion vertices), and is 2~15 times faster than the state-of-the-art PowerGraph on a cluster of 16 multicore nodes. |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2015.2453978 |