Fast and Scalable Multicore YOLOv3-Tiny Accelerator Using Input Stationary Systolic Architecture
This article proposes a scalable accelerator for deep learning (DL) implementation on edge computing, which is often limited by power, storage, and computation speed. The accelerator is based on systolic array cores with 126 processing elements (PEs) and optimized for YOLOv3-Tiny with 448 <inline...
Saved in:
Published in: | IEEE transactions on very large scale integration (VLSI) systems Vol. 31; no. 11; pp. 1 - 14 |
---|---|
Main Authors: | , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
New York
IEEE
01-11-2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This article proposes a scalable accelerator for deep learning (DL) implementation on edge computing, which is often limited by power, storage, and computation speed. The accelerator is based on systolic array cores with 126 processing elements (PEs) and optimized for YOLOv3-Tiny with 448 <inline-formula> <tex-math notation="LaTeX">\(\times\)</tex-math> </inline-formula> 448 input images. Two multicast (MC) network architectures, feature map multicasting and weight multicasting, are introduced to control data stream distribution within the multicores. Results show that the proposed weight multicast (W-MC) systems outperformed the feature map multicast (FMAP-MC) systems in multicore scenarios, with up to 2.23<inline-formula> <tex-math notation="LaTeX">\(\times\)</tex-math> </inline-formula> frame rates per second (FPS). The 4-core W-MC system achieved the best efficiency with an overall frame rate of 13.73 FPS/W and an overall throughput of 35.83 GOPS/W. The 8-core W-MC system delivered the best performance, with a frame rate of 38.50 FPS after normalization to the standard YOLOv3-Tiny network. The proposed accelerator offers better computational efficiency and greater accelerator utilization in real-world inference scenarios, compared to previous state-of-the-art works. |
---|---|
ISSN: | 1063-8210 1557-9999 |
DOI: | 10.1109/TVLSI.2023.3305937 |