With Shared Microexponents, A Little Shifting Goes a Long Way
This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, wh...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
15-02-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper introduces Block Data Representations (BDR), a framework for
exploring and evaluating a wide spectrum of narrow-precision formats for deep
learning. It enables comparison of popular quantization standards, and through
BDR, new formats based on shared microexponents (MX) are identified, which
outperform other state-of-the-art quantization approaches, including
narrow-precision floating-point and block floating-point. MX utilizes multiple
levels of quantization scaling with ultra-fine scaling factors based on shared
microexponents in the hardware. The effectiveness of MX is demonstrated on
real-world models including large-scale generative pretraining and inferencing,
and production-scale recommendation systems. |
---|---|
DOI: | 10.48550/arxiv.2302.08007 |