GPU concurrency choices in graph analytics

Graph analytics is becoming ever more ubiquitous in today's world. However, situational dynamic changes in input graphs, such as changes in traffic and weather patterns, lead to variations in concurrency. Moreover, graph algorithms are known to have data dependent loops and fine-grain synchroni...

Full description

Saved in:
Bibliographic Details
Published in:2016 IEEE International Symposium on Workload Characterization (IISWC) pp. 1 - 10
Main Authors: Ahmad, Masab, Khan, Omer
Format: Conference Proceeding
Language:English
Published: IEEE 01-09-2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Graph analytics is becoming ever more ubiquitous in today's world. However, situational dynamic changes in input graphs, such as changes in traffic and weather patterns, lead to variations in concurrency. Moreover, graph algorithms are known to have data dependent loops and fine-grain synchronization that makes them hard to scale on parallel machines. Recent trends in computing indicate the rise of massively-threaded machines, such as Graphic Processing Units (GPUs). It is of paramount importance to adopt these graph algorithms efficiently on these GPU machines. However, concurrency variations are expected to play a formidable role in achieving good GPU performance. This paper performs an in-depth characterization of GPU architectural choices for graph benchmarks executing on a diverse set of input graphs. The analysis shows that performance improves by a geometric mean of 40% when optimal threads are spawned on a GPU relative to a naive choice that maximizes total thread count. Moreover, an additional 41% performance is achieved when the number of threads per GPU work group is reduced to a setting that optimizes exploitable hardware concurrency. It is also shown that algorithmic auto-tuning coupled with the right architectural choices co-optimize GPU performance.
DOI:10.1109/IISWC.2016.7581278