High-Performance and Programmable Attentional Graph Neural Networks with Global Tensor Formulations

Graph attention models (A-GNNs), a type of Graph Neural Networks (GNNs), have been shown to be more powerful than simpler convolutional GNNs (C-GNNs). However, A-GNNs are more complex to program and difficult to scale. To address this, we de-velop a novel mathematical formulation, based on tensors t...

Full description

Saved in:

Bibliographic Details
Published in:	SC23: International Conference for High Performance Computing, Networking, Storage and Analysis pp. 1 - 18
Main Authors:	Besta, Maciej, Renc, Pawel, Gerstenberger, Robert, Labini, Paolo Sylos, Ziogas, Alexandros, Chen, Tiancheng, Gianinazzi, Lukas, Scheidl, Florian, Szenes, Kalman, Carigiet, Armon, Iff, Patrick, Kwasniewski, Grzegorz, Kanakagiri, Raghavendra, Ge, Chio, Jaeger, Sammy, Was, Jaroslaw, Vella, Flavio, Hoefler, Torsten
Format:	Conference Proceeding
Language:	English
Published:	ACM 11-11-2023
Subjects:	Graph Attention Models Graph neural networks High performance computing Libraries Scalability Sparse-Dense Tensor Operations Tensors Training Vectors
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Graph attention models (A-GNNs), a type of Graph Neural Networks (GNNs), have been shown to be more powerful than simpler convolutional GNNs (C-GNNs). However, A-GNNs are more complex to program and difficult to scale. To address this, we de-velop a novel mathematical formulation, based on tensors that group all the feature vectors, targeting both training and inference of A-GNNs. The formulation enables straightforward adoption of communication-minimizing routines, it fosters optimizations such as vectorization, and it enables seamless integration with established linear algebra DSLs or libraries such as GraphBLAS. Our implementation uses a data redistribution scheme explicitly de-veloped for sparse-dense tensor operations used heavily in GNNs, and fusing optimizations that further minimize memory usage and communication cost. We ensure theoretical asymptotic reductions in communicated data compared to the established message-passing GNN paradigm. Finally, we provide excellent scalability and speedups of even 4-5x over modern libraries such as Deep Graph Library.
ISSN:	2167-4337
DOI:	10.1145/3581784.3607067