PS directory: a scalable multilevel directory cache for CMPs
As the number of cores increases in current and future chip-multiprocessor (CMP) generations, coherence protocols must rely on novel hardware structures to scale in terms of performance, power, and area. Systems that use directory information for coherence purposes are currently the most scalable al...
Saved in:
Published in: | The Journal of supercomputing Vol. 71; no. 8; pp. 2847 - 2876 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
New York
Springer US
01-08-2015
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | As the number of cores increases in current and future chip-multiprocessor (CMP) generations, coherence protocols must rely on novel hardware structures to scale in terms of performance, power, and area. Systems that use directory information for coherence purposes are currently the most scalable alternative. This paper studies the important differences between the directory behavior of private and shared blocks, which claim for a separate management of both types of blocks at the directory. We propose the PS directory, a two-level directory cache that keeps the reduced number of frequently accessed shared entries in a small and fast first-level cache, namely
Shared
cache, and uses a larger and slower second-level
Private
cache to track the large amount of private blocks. Entries in the Private cache do not implement the sharer vector, which allows important silicon area savings. Speed and area reasons suggest the use of eDRAM technology, much denser but slower than SRAM technology, for the Private cache, which in turn brings energy savings. Experimental results for a 16-core CMP show that, compared to a conventional directory, the PS directory improves performance by 14
%
while reducing silicon area and energy consumption by 34 and 27
%
, respectively. Also, compared to the state-of-the-art Multi-Grain Directory, the PS directory apart from increasing performance, it reduces power by 18.7
%
, and provides more scalability in terms of area. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/s11227-014-1332-5 |