A High-Performance CNN Processor Based on FPGA for MobileNets
Convolution neural networks (CNNs) have been widely applied in the fields of computer vision tasks. However, it is hard to deploy those standard neural networks into embedded devices because of their large amount of operations and parameters. MobileNet, the state-of-the-art CNN which adopts depthwis...
Saved in:
Published in: | 2019 29th International Conference on Field Programmable Logic and Applications (FPL) pp. 136 - 143 |
---|---|
Main Authors: | , , , , , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-09-2019
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Convolution neural networks (CNNs) have been widely applied in the fields of computer vision tasks. However, it is hard to deploy those standard neural networks into embedded devices because of their large amount of operations and parameters. MobileNet, the state-of-the-art CNN which adopts depthwise separable convolution to replace the standard convolution has significantly reduced operations and parameters with only limited loss in accuracy. A high-performance CNN processor based on FPGA is proposed in this paper. To improve the efficiency, two dedicated computing engines named Conv Engine and Dwcv Engine were designed for pointwise convolution and depthwise convolution respectively. The schedule for Conv Engine and Dwcv Engine has significantly improved the efficiency of our accelerator. Furthermore, we designed a special architecture called Channel Augmentation to improve the efficiency in the first layer of MobileNets. The accelerator can be flexibly deployed to various devices with different configurations to balance hardware resources and computational performance. We implemented our accelerator on ZU2 and ZU9 MPSoC FPGAs. The classification on ImageNet achieved 205.3 frames per second(fps) on ZU2 and 809.8 fps on ZU9, which is 15.4x speedup on ZU2 and 60.7x speedup on ZU9 compared to CPU. We also deployed MobileNet + SSD network on our accelerator for object detection, and achieved 31.0 fps on ZU2 and 124.3 fps on ZU9. |
---|---|
ISSN: | 1946-1488 |
DOI: | 10.1109/FPL.2019.00030 |