Balancing Locality and Concurrency: Solving Sparse Triangular Systems on GPUs

Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of such equations, two types of approaches have been used: on GPUs, concurrency has been prioritised to the disadvantage of data locality, while o...

Full description

Saved in:

Bibliographic Details
Published in:	2016 IEEE 23rd International Conference on High Performance Computing (HiPC) pp. 183 - 192
Main Authors:	Picciau, Andrea, Inggs, Gordon E., Wickerson, John, Kerrigan, Eric C., Constantinides, George A.
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-12-2016
Subjects:	Algorithm design and analysis concurrency Concurrent computing Context CUSPARSE data locality Data structures GPU Graphics processing units linear algebra OpenCL Partitioning algorithms sparse Sparse matrices systems of equations
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of such equations, two types of approaches have been used: on GPUs, concurrency has been prioritised to the disadvantage of data locality, while on multi-core CPUs, data locality has been prioritised to the disadvantage of concurrency. In this paper, we discuss the interaction between data locality and concurrency in the solution of STLs on GPUs, and we present a new algorithm that balances both. We demonstrate empirically that, subject to there being enough concurrency available in the input matrix, our algorithm outperforms Nvidia's concurrency-prioritising CUSPARSE algorithm for GPUs. Experimental results show a maximum speedup of 5.8-fold. Our solution algorithm, which we have implemented in OpenCL, requires a pre-processing phase that partitions the graph associated with the input matrix into sub-graphs, whose data can be stored in low-latency local memories. This preliminary analysis phase is expensive, but because it depends only on the input matrix, its cost can be amortised when solving for many different right-hand sides.
AbstractList	Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of such equations, two types of approaches have been used: on GPUs, concurrency has been prioritised to the disadvantage of data locality, while on multi-core CPUs, data locality has been prioritised to the disadvantage of concurrency. In this paper, we discuss the interaction between data locality and concurrency in the solution of STLs on GPUs, and we present a new algorithm that balances both. We demonstrate empirically that, subject to there being enough concurrency available in the input matrix, our algorithm outperforms Nvidia's concurrency-prioritising CUSPARSE algorithm for GPUs. Experimental results show a maximum speedup of 5.8-fold. Our solution algorithm, which we have implemented in OpenCL, requires a pre-processing phase that partitions the graph associated with the input matrix into sub-graphs, whose data can be stored in low-latency local memories. This preliminary analysis phase is expensive, but because it depends only on the input matrix, its cost can be amortised when solving for many different right-hand sides.
Author	Kerrigan, Eric C. Inggs, Gordon E. Constantinides, George A. Picciau, Andrea Wickerson, John
Author_xml	– sequence: 1 givenname: Andrea surname: Picciau fullname: Picciau, Andrea email: a.picciau13@imperial.ac.uk organization: Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK – sequence: 2 givenname: Gordon E. surname: Inggs fullname: Inggs, Gordon E. email: g.inggs11@imperial.ac.uk organization: Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK – sequence: 3 givenname: John surname: Wickerson fullname: Wickerson, John email: j.wickerson@imperial.ac.uk organization: Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK – sequence: 4 givenname: Eric C. surname: Kerrigan fullname: Kerrigan, Eric C. email: e.kerrigan@imperial.ac.uk organization: Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK – sequence: 5 givenname: George A. surname: Constantinides fullname: Constantinides, George A. email: g.constantinides@imperial.ac.uk organization: Dept. of Electr. & Electron. Eng., Imperial Coll. London, London, UK
BookMark	eNotzL1OwzAUQGEjwUALIxOLXyDlOjc_vmwQQYsURKWEubpx7MpS6lROi5S3R6hMZ_l0FuI6jMEK8aBgpRTQ08Zvq1UKqlgBwpVYqBwI8kwpdSs-X3ngYHzYy3o0PPjTLDn0shqDOcdog5mfZTMOP3-iOXKcrGyj57A_DxxlM08ne5jkGOR6-z3diRvHw2Tv_7sU7ftbW22S-mv9Ub3UiVcpQeIYXU-sSyCt-wKotF1GZQo9UY6YogYsuwwBWDmNbHQBLieXG-xUgbgUj5ett9bujtEfOM67UiMVGvEXBsRIIA
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/HiPC.2016.030
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1509054111 9781509054114
EndPage	192
ExternalDocumentID	7839683
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i1290-fa3fd9a870988d6097eb49720d99533238037b4300a1f83ac860f59f5c3b1633
IEDL.DBID	RIE
IngestDate	Thu Jun 29 18:37:40 EDT 2023
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i1290-fa3fd9a870988d6097eb49720d99533238037b4300a1f83ac860f59f5c3b1633
OpenAccessLink	http://spiral.imperial.ac.uk/bitstream/10044/1/40611/2/AndreaHiPC16.pdf
PageCount	10
ParticipantIDs	ieee_primary_7839683
PublicationCentury	2000
PublicationDate	2016-Dec.
PublicationDateYYYYMMDD	2016-12-01
PublicationDate_xml	– month: 12 year: 2016 text: 2016-Dec.
PublicationDecade	2010
PublicationTitle	2016 IEEE 23rd International Conference on High Performance Computing (HiPC)
PublicationTitleAbbrev	HIPC
PublicationYear	2016
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.7824982
Snippet	Many numerical optimisation problems rely on fast algorithms for solving sparse triangular systems of linear equations (STLs). To accelerate the solution of...
SourceID	ieee
SourceType	Publisher
StartPage	183
SubjectTerms	Algorithm design and analysis concurrency Concurrent computing Context CUSPARSE data locality Data structures GPU Graphics processing units linear algebra OpenCL Partitioning algorithms sparse Sparse matrices systems of equations
Title	Balancing Locality and Concurrency: Solving Sparse Triangular Systems on GPUs
URI	https://ieeexplore.ieee.org/document/7839683
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1sT55UWvGbHDyaNjXZTeLR2tqDSqEVvJV8QkF2S9s9-O-d7Jb24sVbGAiBScLLTN68AbhnzDAx0I5aBG8qtFXUeimp4SrzmTQiyLqJ7Ux-fKmXUZLJedjXwoQQavJZ6KVh_ZfvS1elVFlfIprniregJbVqarUOspn9yXI6TFytvNdwmg_NUmqsGJ_8b5VT6B6K7sh0DydncBSKDrw_J-qhQwN5S6iDb2aCsT_BOa5WVnI_T2RWfqe0AJmtMEoNZI5nqkgd5tdkp0dOyoK8Tj83XZiPR_PhhO46INBlyg_RaHj02uCd0kr5nGkZrNDykXmdaKEIt4xLKzh6fBAVN07lLGY6Zo5bfGjxc2gXZREugGCcZ5xVuY1RCMdza1nGnEbAZw63JF5CJ3lisWo0LhY7J1z9bb6G4-TohtZxA-3tugq30Nr46q7elV-d449d
link.rule.ids	310,311,782,786,791,792,798,27934,54767
linkProvider	IEEE
linkToHtml	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0ED3pSA8Zve_BoodjutvUoghiBkLAm3kg_ExKyS0AO_nunuwQuXrw1kzRNpm1eZ_rmDUIPlGrKO8oSA-BNuDKSGCcE0UwmLhGae1E2sZ2K8Zd87UWZnMddLYz3viSf-VYcln_5rrCbmCprC0DzVLIaOky4SEVVrbUXzmwP5pNuZGulrYrVvG-XUqJF_-R_65yi5r7sDk92gHKGDnzeQKOXSD60YMDDiDvwasYQ_WOYY0ttJfvzjKfFIiYG8HQJcarHGZyqPPaYX-GtIjkucvw2-Vw3UdbvZd0B2fZAIPOYISJBs-CUhlulpHQpVcIbrsQTdSoSQwFwKROGM_B5J0imrUxpSFRILDPw1GLnqJ4Xub9AGCI9bY1MTQicW5YaQxNqFUA-tbAp4RI1oidmy0rlYrZ1wtXf5nt0NMhGw9nwffxxjY6j0yuSxw2qf682_hbV1m5zV-7QL1t5kq4
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2016+IEEE+23rd+International+Conference+on+High+Performance+Computing+%28HiPC%29&rft.atitle=Balancing+Locality+and+Concurrency%3A+Solving+Sparse+Triangular+Systems+on+GPUs&rft.au=Picciau%2C+Andrea&rft.au=Inggs%2C+Gordon+E.&rft.au=Wickerson%2C+John&rft.au=Kerrigan%2C+Eric+C.&rft.date=2016-12-01&rft.pub=IEEE&rft.spage=183&rft.epage=192&rft_id=info:doi/10.1109%2FHiPC.2016.030&rft.externalDocID=7839683