An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication

Sparse matrix-vector multiplication (SpM × V) has been characterized as one of the most significant computational scientific kernels. The key algorithmic characteristic of the SpM × V kernel, that inhibits it from achieving high performance, is its very low flop:byte ratio. In this paper, we present...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems Vol. 24; no. 10; pp. 1930 - 1940
Main Authors: Karakasis, V., Gkountouvas, T., Kourtis, K., Goumas, G., Koziris, N.
Format: Journal Article
Language:English
Published: IEEE 01-10-2013
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sparse matrix-vector multiplication (SpM × V) has been characterized as one of the most significant computational scientific kernels. The key algorithmic characteristic of the SpM × V kernel, that inhibits it from achieving high performance, is its very low flop:byte ratio. In this paper, we present a compressed storage format, called Compressed Sparse eXtended (CSX), that is able to detect and encode simultaneously multiple commonly encountered substructures inside a sparse matrix. Relying on aggressive compression techniques of the sparse matrix's indexing structure, CSX is able to considerably reduce the memory footprint of a sparse matrix, alleviating the pressure to the memory subsystem. In a diverse set of sparse matrices, CSX was able to provide a more than 40 percent average performance improvement over the standard CSR format in SMP architectures and surpassed 20 percent improvement in NUMA systems, significantly outperforming other CSR alternatives. Additionally, it was able to adapt successfully to the nonzero element structure of the considered matrices, exhibiting very stable performance. Finally, in the context of a "real-life" multiphysics simulation software, CSX accelerated the SpM × V component nearly 40 percent and the total solver time approximately 15 percent.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2012.290