Flexible and Fully Quantized Ultra-Lightweight TinyissimoYOLO for Ultra-Low-Power Edge Systems
This paper deploys and explores variants of TinyissimoYOLO, a highly flexible and fully quantized ultra-lightweight object detection network designed for edge systems with a power envelope of a few milliwatts. With experimental measurements, we present a comprehensive characterization of the network...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
14-07-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper deploys and explores variants of TinyissimoYOLO, a highly flexible
and fully quantized ultra-lightweight object detection network designed for
edge systems with a power envelope of a few milliwatts. With experimental
measurements, we present a comprehensive characterization of the network's
detection performance, exploring the impact of various parameters, including
input resolution, number of object classes, and hidden layer adjustments. We
deploy variants of TinyissimoYOLO on state-of-the-art ultra-low-power extreme
edge platforms, presenting an in-depth a comparison on latency, energy
efficiency, and their ability to efficiently parallelize the workload. In
particular, the paper presents a comparison between a novel parallel RISC-V
processor (GAP9 from Greenwaves) with and without use of its on-chip hardware
accelerator, an ARM Cortex-M7 core (STM32H7 from ST Microelectronics), two ARM
Cortex-M4 cores (STM32L4 from STM and Apollo4b from Ambiq), and a multi-core
platform with a CNN hardware accelerator (Analog Devices MAX78000).
Experimental results show that the GAP9's hardware accelerator achieves the
lowest inference latency and energy at 2.12ms and 150uJ respectively, which is
around 2x faster and 20% more efficient than the next best platform, the
MAX78000. The hardware accelerator of GAP9 can even run an increased resolution
version of TinyissimoYOLO with 112x112 pixels and 10 detection classes within
3.2ms, consuming 245uJ. To showcase the competitiveness of a versatile
general-purpose system we also deployed and profiled a multi-core
implementation on GAP9 at different operating points, achieving 11.3ms with the
lowest-latency and 490uJ with the most energy-efficient configuration. With
this paper, we demonstrate the suitability and flexibility of TinyissimoYOLO on
state-of-the-art detection datasets for real-time ultra-low-power edge
inference. |
---|---|
DOI: | 10.48550/arxiv.2307.05999 |