C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization
Convolutional neural networks (CNN) accelerators have been proposed as an efficient hardware solution for deep learning based applications, which are known to be both compute-and-memory intensive. Although the most advanced CNN accelerators can deliver high computational throughput, the performance...
Saved in:
Published in: | 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) pp. 1 - 6 |
---|---|
Main Authors: | , , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
05-06-2016
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Convolutional neural networks (CNN) accelerators have been proposed as an efficient hardware solution for deep learning based applications, which are known to be both compute-and-memory intensive. Although the most advanced CNN accelerators can deliver high computational throughput, the performance is highly unstable. Once changed to accommodate a new network with different parameters like layers and kernel size, the fixed hardware structure, may no longer well match the data flows. Consequently, the accelerator will fail to deliver high performance due to the underutilization of either logic resource or memory bandwidth. To overcome this problem, we proposed a novel deep learning accelerator, which offers multiple types of data-level parallelism: inter-kernel, intra-kernel and hybrid. Our design can adaptively switch among the three types of parallelism and the corresponding data tiling schemes to dynamically match different networks or even different layers of a single network. No matter how we change the hardware configurations or network types, the proposed network mapping strategy ensures the optimal performance and energy-efficiency. Compared with previous state-of-the-art NN accelerators, it is possible to achieve a speedup of 4.0x-8.3x for some layers of the well-known large scale CNNs. For the whole phase of network forward-propagation, our design achieves 28.04% PE energy saving, 90.3% on-chip memory energy saving on average. |
---|---|
DOI: | 10.1145/2897937.2897995 |