Reducing off-chip memory traffic in deep CNNs using stick buffer cache

Recent studies show that traffic between the Convolutional Neural Network (CNN) accelerators and off-chip memory becomes critical with respect to the energy consumption, as the networks become deeper in order to improve performance. This is especially important for low power embedded applications. S...

Full description

Saved in:
Bibliographic Details
Published in:2017 25th Telecommunication Forum (TELFOR) pp. 1 - 4
Main Authors: Rakanovic, Damjan, Erdeljan, Andrea, Vranjkovic, Vuk, Vukobratovic, Bogdan, Teodorovic, Predrag, Struharik, Rastislav
Format: Conference Proceeding
Language:English
Published: IEEE 01-11-2017
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recent studies show that traffic between the Convolutional Neural Network (CNN) accelerators and off-chip memory becomes critical with respect to the energy consumption, as the networks become deeper in order to improve performance. This is especially important for low power embedded applications. Since on-chip data transfer is much less expensive in terms of power consumption, significant improvement can be obtained by caching and reusing previously transferred off-chip data. However, due to unique caching pattern, which is adequate for calculations of convolutions within CNNs, standard cache memories would not be efficient for this purpose. In this paper, we propose an intelligent on-chip memory architecture which allows caching and significant reduction of feature map transfer from off-chip memory, during computations of convolutional layers in CNNs. Experiment results show that the proposed scheme can reduce off-chip feature map traffic up to 98.5% per convolutional layer for AlexNet and 89% for each convolutional layer of VGG-16.
DOI:10.1109/TELFOR.2017.8249398