A Split Execution Model for SpTRSV

Sparse Triangular Solve (SpTRSV) is an important and extensively used kernel in scientific computing. Parallelism within SpTRSV depends upon matrix sparsity pattern and, in many cases, is non-uniform from one computational step to the next. In cases where the SpTRSV computational steps have contrast...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on parallel and distributed systems Vol. 32; no. 11; pp. 2809 - 2822
Main Authors:	Ahmad, Najeeb, Yilmaz, Buse, Unat, Didem
Format:	Journal Article
Language:	English
Published:	New York IEEE 01-11-2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Central processing units Computational modeling CPU-GPU computing CPUs Fats Graphics processing units heterogeneous computing Kernel Parallel algorithms Phased arrays sparse linear systems Sparse matrices Sparse triangular solve Sparsity SpTRSV SpTS
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Sparse Triangular Solve (SpTRSV) is an important and extensively used kernel in scientific computing. Parallelism within SpTRSV depends upon matrix sparsity pattern and, in many cases, is non-uniform from one computational step to the next. In cases where the SpTRSV computational steps have contrasting parallelism characteristics- some steps are more parallel, others more sequential in nature, the performance of an SpTRSV algorithm may be limited by the contrasting parallelism characteristics. In this work, we propose a split-execution model for SpTRSV to automatically divide SpTRSV computation into two sub-SpTRSV systems and an SpMV, such that one of the sub-SpTRSVs has more parallelism than the other. Each sub-SpTRSV is then computed using different SpTRSV algorithms, which are possibly executed on different platforms (CPU or GPU). By analyzing the SpTRSV Directed Acyclic Graph (DAG) and matrix sparsity features, we use a heuristics-based approach to (i) automatically determine the suitability of an SpTRSV for split-execution, (ii) find the appropriate split-point, and (iii) execute SpTRSV in a split fashion using two SpTRSV algorithms while managing any required inter-platform communication. Experimental evaluation of the execution model on two CPU-GPU machines with a matrix dataset of 327 matrices from the SuiteSparse Matrix Collection shows that our approach correctly selects the fastest SpTRSV method (split or unsplit) for 88 percent of matrices on the Intel Xeon Gold (6148) + NVIDIA Tesla V100 and 83 percent on the Intel Core I7 + NVIDIA G1080 Ti platform achieving speedups up to 10x and 6.36x respectively.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2021.3074501