A Winograd-Based CNN Accelerator with a Fine-Grained Regular Sparsity Pattern

Field-Programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd transformation and weight pruning are widely adopted to reduce the storage and arithmetic overhead in matrix multiplication of CNN on FPGAs. Recent studies strive...

Full description

Saved in:

Bibliographic Details
Published in:	2020 30th International Conference on Field-Programmable Logic and Applications (FPL) pp. 254 - 261
Main Authors:	Yang, Tao, Liao, Yunkun, Shi, Jianping, Liang, Yun, Jing, Naifeng, Jiang, Li
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-08-2020
Subjects:	Computational modeling Computer architecture Convolution Convolutional neural networks Hardware Neural networks Parallel processing
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Field-Programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd transformation and weight pruning are widely adopted to reduce the storage and arithmetic overhead in matrix multiplication of CNN on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. In this paper, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely Sub-Row-Balanced Sparsity (SRBS) pattern, to overcome the above challenge. Then, we develop a 2-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Finally, we design an FPGA accelerator that takes advantage of the SRBS pattern to eliminate low-parallelism computation and irregular memory accesses. Experimental results on VGG16 and Resnet-18 with CIFAR-10 and Imagenet show up to 4.4x and 3.06x speedup compared with the state-of-the-art dense Winograd accelerator and 52% (theoretical upper-bound is 72%) performance enhancement compared with the state-of-the-art sparse Winograd accelerator. The resulting sparsity ratio is 80% and 75% and the loss of model accuracy is negligible.
ISSN:	1946-1488
DOI:	10.1109/FPL50879.2020.00050