Fine Grain Cache Partitioning Using Per-Instruction Working Blocks

A traditional least-recently used (LRU) cache replacement policy fails to achieve the performance of the optimal replacement policy when cache blocks with diverse reuse characteristics interfere with each other. When multiple applications share a cache, it is often partitioned among the applications...

Full description

Saved in:

Bibliographic Details
Published in:	2015 International Conference on Parallel Architecture and Compilation (PACT) pp. 305 - 316
Main Authors:	Park, Jason Jong Kyu, Yongjun Park, Mahlke, Scott
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-10-2015
Subjects:	Approximation algorithms Cache Replacement Policy Fine Grain Cache Partitioning Hardware Interference Memory management Monitoring Parallel architectures Position measurement
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	A traditional least-recently used (LRU) cache replacement policy fails to achieve the performance of the optimal replacement policy when cache blocks with diverse reuse characteristics interfere with each other. When multiple applications share a cache, it is often partitioned among the applications because cache blocks show similar reuse characteristics within each application. In this paper, we extend the idea to a single application by viewing a cache as a shared resource between individual memory instructions. To that end, we propose Instruction-based LRU (ILRU), a fine grain cache partitioning that way-partitions individual cache sets based on per-instruction working blocks, which are cache blocks required by an instruction to satisfy all the reuses within a set. In ILRU, a memory instruction steals a block from another only when it requires more blocks than it currently has. Otherwise, a memory instruction victimizes among the cache blocks inserted by itself. Experiments show that ILRU can improve the cache performance in all levels of cache, reducing the number of misses by an average of 7.0% for L1, 9.1% for L2, and 8.7% for L3, which results in a geometric mean performance improvement of 5.3%. ILRU for a three-level cache hierarchy imposes a modest 1.3% storage overhead over the total cache size.
ISSN:	1089-795X 2641-7944
DOI:	10.1109/PACT.2015.11