Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems

SUMMARYThe block Wiedemann (BW) algorithm is frequently used to solve sparse linear systems over GF(2). Iterative sparse matrix–vector multiplication is the most time‐consuming operation. The necessity to accelerate this step is motivated by the application of BW to very large matrices used in the l...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation Vol. 25; no. 4; pp. 586 - 603
Main Authors: Schmidt, Bertil, Aribowo, Hans, Dang, Hoang-Vu
Format: Journal Article
Language:English
Published: Blackwell Publishing Ltd 01-02-2013
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:SUMMARYThe block Wiedemann (BW) algorithm is frequently used to solve sparse linear systems over GF(2). Iterative sparse matrix–vector multiplication is the most time‐consuming operation. The necessity to accelerate this step is motivated by the application of BW to very large matrices used in the linear algebra step of the number field sieve (NFS) for integer factorization. In this paper, we derive an efficient CUDA implementation of this operation by using a newly designed hybrid sparse matrix format. This leads to speedups between 4 and 8 on a single graphics processing unit (GPU) for a number of tested NFS matrices compared with an optimized multicore implementation. We further present a GPU cluster implementation of the full BW for NFS matrices. A small‐sized GPU cluster is able to outperform CPU clusters of larger size for large matrices such as the one obtained from the Kilobit special NFS factorization. Copyright © 2012 John Wiley & Sons, Ltd.
Bibliography:ark:/67375/WNG-ZBS88PH2-D
istex:A95318FD4BD902131B4F774EEB7966939EAECC52
ArticleID:CPE2896
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.2896