Reducing off-chip memory traffic in deep CNNs using stick buffer cache
Recent studies show that traffic between the Convolutional Neural Network (CNN) accelerators and off-chip memory becomes critical with respect to the energy consumption, as the networks become deeper in order to improve performance. This is especially important for low power embedded applications. S...
Saved in:
Published in: | 2017 25th Telecommunication Forum (TELFOR) pp. 1 - 4 |
---|---|
Main Authors: | , , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-11-2017
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recent studies show that traffic between the Convolutional Neural Network (CNN) accelerators and off-chip memory becomes critical with respect to the energy consumption, as the networks become deeper in order to improve performance. This is especially important for low power embedded applications. Since on-chip data transfer is much less expensive in terms of power consumption, significant improvement can be obtained by caching and reusing previously transferred off-chip data. However, due to unique caching pattern, which is adequate for calculations of convolutions within CNNs, standard cache memories would not be efficient for this purpose. In this paper, we propose an intelligent on-chip memory architecture which allows caching and significant reduction of feature map transfer from off-chip memory, during computations of convolutional layers in CNNs. Experiment results show that the proposed scheme can reduce off-chip feature map traffic up to 98.5% per convolutional layer for AlexNet and 89% for each convolutional layer of VGG-16. |
---|---|
DOI: | 10.1109/TELFOR.2017.8249398 |