Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems
SUMMARYThe block Wiedemann (BW) algorithm is frequently used to solve sparse linear systems over GF(2). Iterative sparse matrix–vector multiplication is the most time‐consuming operation. The necessity to accelerate this step is motivated by the application of BW to very large matrices used in the l...
Saved in:
Published in: | Concurrency and computation Vol. 25; no. 4; pp. 586 - 603 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
Blackwell Publishing Ltd
01-02-2013
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | SUMMARYThe block Wiedemann (BW) algorithm is frequently used to solve sparse linear systems over GF(2). Iterative sparse matrix–vector multiplication is the most time‐consuming operation. The necessity to accelerate this step is motivated by the application of BW to very large matrices used in the linear algebra step of the number field sieve (NFS) for integer factorization. In this paper, we derive an efficient CUDA implementation of this operation by using a newly designed hybrid sparse matrix format. This leads to speedups between 4 and 8 on a single graphics processing unit (GPU) for a number of tested NFS matrices compared with an optimized multicore implementation. We further present a GPU cluster implementation of the full BW for NFS matrices. A small‐sized GPU cluster is able to outperform CPU clusters of larger size for large matrices such as the one obtained from the Kilobit special NFS factorization. Copyright © 2012 John Wiley & Sons, Ltd. |
---|---|
Bibliography: | ark:/67375/WNG-ZBS88PH2-D istex:A95318FD4BD902131B4F774EEB7966939EAECC52 ArticleID:CPE2896 ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
ISSN: | 1532-0626 1532-0634 |
DOI: | 10.1002/cpe.2896 |