Balancing Locality and Concurrency: Solving Sparse Triangular Systems on GPUs
Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of such equations, two types of approaches have been used: on GPUs, concurrency has been prioritised to the disadvantage of data locality, while o...
Saved in:
Published in: | 2016 IEEE 23rd International Conference on High Performance Computing (HiPC) pp. 183 - 192 |
---|---|
Main Authors: | , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-12-2016
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of such equations, two types of approaches have been used: on GPUs, concurrency has been prioritised to the disadvantage of data locality, while on multi-core CPUs, data locality has been prioritised to the disadvantage of concurrency. In this paper, we discuss the interaction between data locality and concurrency in the solution of STLs on GPUs, and we present a new algorithm that balances both. We demonstrate empirically that, subject to there being enough concurrency available in the input matrix, our algorithm outperforms Nvidia's concurrency-prioritising CUSPARSE algorithm for GPUs. Experimental results show a maximum speedup of 5.8-fold. Our solution algorithm, which we have implemented in OpenCL, requires a pre-processing phase that partitions the graph associated with the input matrix into sub-graphs, whose data can be stored in low-latency local memories. This preliminary analysis phase is expensive, but because it depends only on the input matrix, its cost can be amortised when solving for many different right-hand sides. |
---|---|
AbstractList | Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of such equations, two types of approaches have been used: on GPUs, concurrency has been prioritised to the disadvantage of data locality, while on multi-core CPUs, data locality has been prioritised to the disadvantage of concurrency. In this paper, we discuss the interaction between data locality and concurrency in the solution of STLs on GPUs, and we present a new algorithm that balances both. We demonstrate empirically that, subject to there being enough concurrency available in the input matrix, our algorithm outperforms Nvidia's concurrency-prioritising CUSPARSE algorithm for GPUs. Experimental results show a maximum speedup of 5.8-fold. Our solution algorithm, which we have implemented in OpenCL, requires a pre-processing phase that partitions the graph associated with the input matrix into sub-graphs, whose data can be stored in low-latency local memories. This preliminary analysis phase is expensive, but because it depends only on the input matrix, its cost can be amortised when solving for many different right-hand sides. |
Author | Kerrigan, Eric C. Inggs, Gordon E. Constantinides, George A. Picciau, Andrea Wickerson, John |
Author_xml | – sequence: 1 givenname: Andrea surname: Picciau fullname: Picciau, Andrea email: a.picciau13@imperial.ac.uk organization: Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK – sequence: 2 givenname: Gordon E. surname: Inggs fullname: Inggs, Gordon E. email: g.inggs11@imperial.ac.uk organization: Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK – sequence: 3 givenname: John surname: Wickerson fullname: Wickerson, John email: j.wickerson@imperial.ac.uk organization: Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK – sequence: 4 givenname: Eric C. surname: Kerrigan fullname: Kerrigan, Eric C. email: e.kerrigan@imperial.ac.uk organization: Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK – sequence: 5 givenname: George A. surname: Constantinides fullname: Constantinides, George A. email: g.constantinides@imperial.ac.uk organization: Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK |
BookMark | eNotzL1OwzAUQGEjwUALIxOLXyDlOjc_vmwQQYsURKWEubpx7MpS6lROi5S3R6hMZ_l0FuI6jMEK8aBgpRTQ08Zvq1UKqlgBwpVYqBwI8kwpdSs-X3ngYHzYy3o0PPjTLDn0shqDOcdog5mfZTMOP3-iOXKcrGyj57A_DxxlM08ne5jkGOR6-z3diRvHw2Tv_7sU7ftbW22S-mv9Ub3UiVcpQeIYXU-sSyCt-wKotF1GZQo9UY6YogYsuwwBWDmNbHQBLieXG-xUgbgUj5ett9bujtEfOM67UiMVGvEXBsRIIA |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/HiPC.2016.030 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 1509054111 9781509054114 |
EndPage | 192 |
ExternalDocumentID | 7839683 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i1290-fa3fd9a870988d6097eb49720d99533238037b4300a1f83ac860f59f5c3b1633 |
IEDL.DBID | RIE |
IngestDate | Thu Jun 29 18:37:40 EDT 2023 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i1290-fa3fd9a870988d6097eb49720d99533238037b4300a1f83ac860f59f5c3b1633 |
OpenAccessLink | http://spiral.imperial.ac.uk/bitstream/10044/1/40611/2/AndreaHiPC16.pdf |
PageCount | 10 |
ParticipantIDs | ieee_primary_7839683 |
PublicationCentury | 2000 |
PublicationDate | 2016-Dec. |
PublicationDateYYYYMMDD | 2016-12-01 |
PublicationDate_xml | – month: 12 year: 2016 text: 2016-Dec. |
PublicationDecade | 2010 |
PublicationTitle | 2016 IEEE 23rd International Conference on High Performance Computing (HiPC) |
PublicationTitleAbbrev | HIPC |
PublicationYear | 2016 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.7824982 |
Snippet | Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 183 |
SubjectTerms | Algorithm design and analysis concurrency Concurrent computing Context CUSPARSE data locality Data structures GPU Graphics processing units linear algebra OpenCL Partitioning algorithms sparse Sparse matrices systems of equations |
Title | Balancing Locality and Concurrency: Solving Sparse Triangular Systems on GPUs |
URI | https://ieeexplore.ieee.org/document/7839683 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1sT55UWvGbHDyaNjXZTeLR2tqDSqEVvJV8QkF2S9s9-O-d7Jb24sVbGAiBScLLTN68AbhnzDAx0I5aBG8qtFXUeimp4SrzmTQiyLqJ7Ux-fKmXUZLJedjXwoQQavJZ6KVh_ZfvS1elVFlfIprniregJbVqarUOspn9yXI6TFytvNdwmg_NUmqsGJ_8b5VT6B6K7sh0DydncBSKDrw_J-qhQwN5S6iDb2aCsT_BOa5WVnI_T2RWfqe0AJmtMEoNZI5nqkgd5tdkp0dOyoK8Tj83XZiPR_PhhO46INBlyg_RaHj02uCd0kr5nGkZrNDykXmdaKEIt4xLKzh6fBAVN07lLGY6Zo5bfGjxc2gXZREugGCcZ5xVuY1RCMdza1nGnEbAZw63JF5CJ3lisWo0LhY7J1z9bb6G4-TohtZxA-3tugq30Nr46q7elV-d449d |
link.rule.ids | 310,311,782,786,791,792,798,27934,54767 |
linkProvider | IEEE |
linkToHtml | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0ED3pSA8Zve_BoodjutvUoghiBkLAm3kg_ExKyS0AO_nunuwQuXrw1kzRNpm1eZ_rmDUIPlGrKO8oSA-BNuDKSGCcE0UwmLhGae1E2sZ2K8Zd87UWZnMddLYz3viSf-VYcln_5rrCbmCprC0DzVLIaOky4SEVVrbUXzmwP5pNuZGulrYrVvG-XUqJF_-R_65yi5r7sDk92gHKGDnzeQKOXSD60YMDDiDvwasYQ_WOYY0ttJfvzjKfFIiYG8HQJcarHGZyqPPaYX-GtIjkucvw2-Vw3UdbvZd0B2fZAIPOYISJBs-CUhlulpHQpVcIbrsQTdSoSQwFwKROGM_B5J0imrUxpSFRILDPw1GLnqJ4Xub9AGCI9bY1MTQicW5YaQxNqFUA-tbAp4RI1oidmy0rlYrZ1wtXf5nt0NMhGw9nwffxxjY6j0yuSxw2qf682_hbV1m5zV-7QL1t5kq4 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2016+IEEE+23rd+International+Conference+on+High+Performance+Computing+%28HiPC%29&rft.atitle=Balancing+Locality+and+Concurrency%3A+Solving+Sparse+Triangular+Systems+on+GPUs&rft.au=Picciau%2C+Andrea&rft.au=Inggs%2C+Gordon+E.&rft.au=Wickerson%2C+John&rft.au=Kerrigan%2C+Eric+C.&rft.date=2016-12-01&rft.pub=IEEE&rft.spage=183&rft.epage=192&rft_id=info:doi/10.1109%2FHiPC.2016.030&rft.externalDocID=7839683 |