Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors

With increasing numbers of cores, future CMPs (chip multi-processors) are likely to have a tiled architecture with a portion of shared L2 cache on each tile and a bank-interleaved distribution of the address space. Although such an organization is effective for avoiding access hot-spots, it can caus...

Full description

Saved in:
Bibliographic Details
Published in:2009 18th International Conference on Parallel Architectures and Compilation Techniques pp. 348 - 357
Main Authors: Qingda Lu, Alias, C., Bondhugula, U., Henretty, T., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P., Yongjian Chen, Haibo Lin, Ngai, T.-f.
Format: Conference Proceeding
Language:English
Published: IEEE 01-09-2009
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With increasing numbers of cores, future CMPs (chip multi-processors) are likely to have a tiled architecture with a portion of shared L2 cache on each tile and a bank-interleaved distribution of the address space. Although such an organization is effective for avoiding access hot-spots, it can cause a significant number of non-local L2 accesses for many commonly occurring regular data access patterns. In this paper we develop a compile-time framework for data locality optimization via data layout transformation. Using a polyhedral model, the program's localizability is determined by analysis of its index set and array reference functions, followed by non-canonical data layout transformation to reduce non-local accesses for localizable computations. Simulation-based results on a 16-core 2D tiled CMP demonstrate the effectiveness of the approach. The developed program transformation technique is also useful in several other data layout transformation contexts.
ISBN:9780769537719
0769537715
ISSN:1089-795X
2641-7944
DOI:10.1109/PACT.2009.36