Adaptive In-Cache Streaming for Efficient Data Management

The design of adaptive architectures is frequently focused on the sole adaptation of the processing blocks, often neglecting the power/performance impact of data transfers and data indexing in the memory subsystem. In particular, conventional address-based models, supported on cache structures to mi...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on very large scale integration (VLSI) systems Vol. 25; no. 7; pp. 2130 - 2143
Main Authors: Neves, Nuno, Tomas, Pedro, Roma, Nuno
Format: Journal Article
Language:English
Published: IEEE 01-07-2017
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The design of adaptive architectures is frequently focused on the sole adaptation of the processing blocks, often neglecting the power/performance impact of data transfers and data indexing in the memory subsystem. In particular, conventional address-based models, supported on cache structures to mitigate the memory wall problem, often struggle when dealing with memory-bound applications or arbitrarily complex data patterns that can be hardly captured by prefetching mechanisms. Stream-based techniques have proven to efficiently tackle such limitations, although not well-suited to handle all types of applications. To mitigate the limitations of both communication paradigms, an efficient unification is herein proposed, by means of a novel in-cache stream paradigm, capable of seamlessly adapting the communication between the address-based and stream-based models. The proposed morphable infrastructure relies on a new dynamic descriptor graph specification, capable of handling regular arbitrarily complex data patterns, which is able to improve the main memory bandwidth utilization through data reutilization and reorganization techniques. When compared with state-of-the-art solutions, the proposed structure offers higher address generation efficiency and achievable memory throughputs, and a significant reduction of the amount of data transfers and main memory accesses, resulting on average in 13 times system performance speedup and in 245 times energy-delay product improvement, when compared with the previous implementations.
ISSN:1063-8210
1557-9999
DOI:10.1109/TVLSI.2017.2671405