Union SRAM: PVT Variation Auto-Compensated, Bit Precision Configurable Current Mode 8T SRAM in Memory MAC Macro
SRAM-based Compute-In-Memory (CIM) has two main paradigms: Digital domain and Analog domain, where both have been extensively explored to overcome the von-Neumann bottleneck and enhance energy efficiency. Digital CIM offers robustness and dynamic bit-precision through bit-wise and bit-serial computi...
Saved in:
Published in: | IEEE access Vol. 12; pp. 162882 - 162893 |
---|---|
Main Authors: | , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Piscataway
IEEE
2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | SRAM-based Compute-In-Memory (CIM) has two main paradigms: Digital domain and Analog domain, where both have been extensively explored to overcome the von-Neumann bottleneck and enhance energy efficiency. Digital CIM offers robustness and dynamic bit-precision through bit-wise and bit-serial computing, but suffers from limited throughput due to multi-cycle operations and degraded area density due to large hardware footprints. In contrast, Analog CIM offers the significantly improved throughput and area density for the analog computing nature and simple logic structure. However, the weight and input data are constrained by fixed bit-precision, limiting the flexibility in DNN applications. Additionally, Analog CIM is susceptible to process, voltage, and temperature (PVT) variations, resulting in potential accuracy degradation. We present a solution to the limitations of analog domain SRAM CIM in dynamic bit-precision configurability and PVT variation vulnerability. Our proposed 4b/8b bit-precision configurable analog current-mode 8T SRAM CIM architecture enhances DNN application flexibility. We also introduce PVT variation auto-compensation scheme, effectively maintaining precise computing accuracy of the analog domain CIM. Post-layout simulations confirm the architecture's efficacy, achieving a throughput of 170 to 793.4 GOPs, area efficiency of 0.227 to 1.06 TOPs/mm2, and energy efficiency of 5.1 to 23.76 TOPs/W. Additionally, software-level simulation on the CIFAR-10 dataset demonstrates 95.02 percent classification accuracy. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3487975 |