An efficient code generation technique for tiled iteration spaces

This paper presents a novel approach for the problem of generating tiled code for nested for-loops, transformed by a tiling transformation. Tiling or supernode transformation has been widely used to improve locality in multilevel memory hierarchies, as well as to efficiently execute loops onto paral...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on parallel and distributed systems Vol. 14; no. 10; pp. 1021 - 1034
Main Authors:	Goumas, G., Athanasaki, M., Koziris, N.
Format:	Journal Article
Language:	English
Published:	New York IEEE 01-10-2003 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Compilers Computer Society Concurrent computing Delay Hierarchies Iterative methods Multilevel Parallel architectures Parallel processing Parallelepipeds Processor scheduling Scheduling algorithm Studies Tiling Transformations Transforms
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper presents a novel approach for the problem of generating tiled code for nested for-loops, transformed by a tiling transformation. Tiling or supernode transformation has been widely used to improve locality in multilevel memory hierarchies, as well as to efficiently execute loops onto parallel architectures. However, automatic code generation for tiled loops can be a very complex compiler work, especially when nonrectangular tile shapes and iteration space bounds are concerned. Our method considerably enhances previous work on rewriting tiled loops, by considering parallelepiped tiles and arbitrary iteration space shapes. In order to generate tiled code, we first enumerate all tiles containing points within the iteration space and, second, sweep all points within each tile. For the first subproblem, we refine upon previous results concerning the computation of new loop bounds of an iteration space that has been transformed by a nonunimodular transformation. For the second subproblem, we transform the initial parallelepiped tile into a rectangular one, in order to generate efficient code with the aid of a nonunimodular transformation matrix and its Hermite Normal Form (HNF). Experimental results show that the proposed method significantly accelerates the compilation process and generates much more efficient code.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2003.1239870