With Shared Microexponents, A Little Shifting Goes a Long Way

This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, wh...

Full description

Saved in:
Bibliographic Details
Main Authors: Rouhani, Bita, Zhao, Ritchie, Elango, Venmugil, Shafipour, Rasoul, Hall, Mathew, Mesmakhosroshahi, Maral, More, Ankit, Melnick, Levi, Golub, Maximilian, Varatkar, Girish, Shao, Lei, Kolhe, Gaurav, Melts, Dimitry, Klar, Jasmine, L'Heureux, Renee, Perry, Matt, Burger, Doug, Chung, Eric, Deng, Zhaoxia, Naghshineh, Sam, Park, Jongsoo, Naumov, Maxim
Format: Journal Article
Language:English
Published: 15-02-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.
DOI:10.48550/arxiv.2302.08007