Search Results - "Sharify, Sayeh"
-
1
Exploiting Typical Values to Accelerate Deep Learning
Published in Computer (Long Beach, Calif.) (01-05-2018)“…To deliver the hardware computation power advances needed to support deep learning innovations, identifying deep learning properties that designers could…”
Get full text
Journal Article -
2
Accelerating Image-Sensor-Based Deep Learning Applications
Published in IEEE MICRO (01-09-2019)“…We review two inference accelerators that exploit value properties in deep neural networks: 1) Diffy that targets spatially correlated activations in…”
Get full text
Journal Article -
3
Value-Based Deep-Learning Acceleration
Published in IEEE MICRO (01-01-2018)“…This article summarizes our recent work on value-based hardware accelerators for image classification using Deep Convolutional Neural Networks (CNNs). The…”
Get full text
Journal Article -
4
Bit-pragmatic deep neural network computing
Published in 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (14-10-2017)“…Deep Neural Networks expose a high degree of parallelism, making them amenable to highly data parallel architectures. However, data-parallel architectures…”
Get full text
Conference Proceeding -
5
Laconic Deep Learning Inference Acceleration
Published in 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA) (01-06-2019)“…We present a method for transparently identifying ineffectual computations during inference with Deep Learning models. Specifically, by decomposing…”
Get full text
Conference Proceeding -
6
Late Breaking Results: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick
Published in 2020 57th ACM/IEEE Design Automation Conference (DAC) (01-07-2020)“…Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. We…”
Get full text
Conference Proceeding -
7
Understanding the difficulty of low-precision post-training quantization of large language models
Published 18-10-2024“…Large language models of high parameter counts are computationally expensive, yet can be made much more efficient by compressing their weights to very low…”
Get full text
Journal Article -
8
Self-Selected Attention Span for Accelerating Large Language Model Inference
Published 14-04-2024“…Large language models (LLMs) can solve challenging tasks. However, their inference computation on modern GPUs is highly inefficient due to the increasing…”
Get full text
Journal Article -
9
Mixed-Precision Quantization with Cross-Layer Dependencies
Published 11-07-2023“…Quantization is commonly used to compress and accelerate deep neural networks. Quantization assigning the same bit-width to all layers leads to large accuracy…”
Get full text
Journal Article -
10
Scaling laws for post-training quantized large language models
Published 15-10-2024“…Generalization abilities of well-trained large language models (LLMs) are known to scale predictably as a function of model size. In contrast to the existence…”
Get full text
Journal Article -
11
Post Training Quantization of Large Language Models with Microscaling Formats
Published 11-05-2024“…Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant…”
Get full text
Journal Article -
12
Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability
Published 27-07-2017“…Tartan (TRT), a hardware accelerator for inference with Deep Neural Networks (DNNs), is presented and evaluated on Convolutional Neural Networks. TRT exploits…”
Get full text
Journal Article -
13
Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks
Published 01-06-2017“…Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial computation to offer performance that is proportional to the fixed-point precision of…”
Get full text
Journal Article -
14
Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing
Published 28-04-2017“…We discuss several modifications and extensions over the previous proposed Cnvlutin (CNV) accelerator for convolutional and fully-connected layers of Deep…”
Get full text
Journal Article -
15
Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
Published in 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) (01-06-2018)“…Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented. In LM every bit of data precision that can be saved…”
Get full text
Conference Proceeding -
16
Laconic Deep Learning Computing
Published 10-05-2018“…We motivate a method for transparently identifying ineffectual computations in unmodified Deep Learning models and without affecting accuracy. Specifically, we…”
Get full text
Journal Article -
17
DPRed: Making Typical Activation and Weight Values Matter In Deep Learning Computing
Published 16-04-2018“…We show that selecting a single data type (precision) for all values in Deep Neural Networks, even if that data type is different per layer, amounts to worst…”
Get full text
Journal Article -
18
Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
Published 23-06-2017“…Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented. In LM every bit of data precision that can be saved…”
Get full text
Journal Article -
19
Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning
Published in 2018 16th IEEE International New Circuits and Systems Conference (NEWCAS) (01-06-2018)“…This article summarizes somde of our work on hardware accelerators for inference with Deep Learning Neural Networks (DNNs). Early success in hardware…”
Get full text
Conference Proceeding -
20
Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How
Published 09-03-2018“…We show that, during inference with Convolutional Neural Networks (CNNs), more than 2x to $8x ineffectual work can be exposed if instead of targeting those…”
Get full text
Journal Article