Almost optimal column-wise prefix-sum computation on the GPU

Row-wise and column-wise prefix-sum computation of a matrix has many applications in the area of image processing such as computation of the summed area table and the Euclidean distance map. It is known that the prefix-sums of a one-dimensional array can be computed efficiently on the GPU. Hence, ro...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing Vol. 74; no. 4; pp. 1510 - 1521
Main Authors: Tokura, Hiroki, Fujita, Toru, Nakano, Koji, Ito, Yasuaki, Bordim, Jacir L.
Format: Journal Article
Language:English
Published: New York Springer US 01-04-2018
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Row-wise and column-wise prefix-sum computation of a matrix has many applications in the area of image processing such as computation of the summed area table and the Euclidean distance map. It is known that the prefix-sums of a one-dimensional array can be computed efficiently on the GPU. Hence, row-wise prefix-sums of a matrix can also be computed efficiently on the GPU by executing this prefix-sum algorithm for every row in parallel. However, the same approach does not work well for computing column-wise prefix-sums due to inefficient stride memory access to the global memory is performed. The main contribution of this paper is to present an almost optimal column-wise prefix-sum algorithm on the GPU. Quite surprisingly, experimental results using NVIDIA TITAN X show that our column-wise prefix-sum algorithm runs only 2–6% slower than matrix duplication. Thus, our column-wise prefix-sum algorithm is almost optimal.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-018-2242-8