Search Results - "Sharify, Sayeh"

1
Exploiting Typical Values to Accelerate Deep Learning by Moshovos, Andreas, Albericio, Jorge, Judd, Patrick, Lascorz, Alberto Delmas, Sharify, Sayeh, Poulos, Zissis, Hetherington, Tayler, Aamodt, Tor, Jerger, Natalie Enright

Published in Computer (Long Beach, Calif.) (01-05-2018)
“…To deliver the hardware computation power advances needed to support deep learning innovations, identifying deep learning properties that designers could…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
2
Accelerating Image-Sensor-Based Deep Learning Applications by Mahmoud, Mostafa, Stuart, Dylan Malone, Poulos, Zissis, Lascorz, Alberto Delmas, Judd, Patrick, Sharify, Sayeh, Nikolic, Milos, Siu, Kevin, Vivancos, Isak Edo, Albericio, Jorge, Moshovos, Andreas

Published in IEEE MICRO (01-09-2019)
“…We review two inference accelerators that exploit value properties in deep neural networks: 1) Diffy that targets spatially correlated activations in…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
3
Value-Based Deep-Learning Acceleration by Moshovos, Andreas, Albericio, Jorge, Judd, Patrick, Delmas Lascorz, Alberto, Sharify, Sayeh, Hetherington, Tayler, Aamodt, Tor, Enright Jerger, Natalie

Published in IEEE MICRO (01-01-2018)
“…This article summarizes our recent work on value-based hardware accelerators for image classification using Deep Convolutional Neural Networks (CNNs). The…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
4
Bit-pragmatic deep neural network computing by Albericio, Jorge, Delmás, Alberto, Judd, Patrick, Sharify, Sayeh, O'Leary, Gerard, Genov, Roman, Moshovos, Andreas

Published in 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (14-10-2017)
“…Deep Neural Networks expose a high degree of parallelism, making them amenable to highly data parallel architectures. However, data-parallel architectures…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
5
Laconic Deep Learning Inference Acceleration by Sharify, Sayeh, Lascorz, Alberto Delmas, Mahmoud, Mostafa, Nikolic, Milos, Siu, Kevin, Stuart, Dylan Malone, Poulos, Zissis, Moshovos, Andreas

Published in 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA) (01-06-2019)
“…We present a method for transparently identifying ineffectual computations during inference with Deep Learning models. Specifically, by decomposing…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
6
Late Breaking Results: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick by Vivancos, Isak Edo, Sharify, Sayeh, Nikolic, Milos, Bannon, Ciaran, Mahmoud, Mostafa, Lascorz, Alberto Delmas, Moshovos, Andreas

Published in 2020 57th ACM/IEEE Design Automation Conference (DAC) (01-07-2020)
“…Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. We…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
7
Understanding the difficulty of low-precision post-training quantization of large language models by Xu, Zifei, Sharify, Sayeh, Yazar, Wanzin, Webb, Tristan, Wang, Xin

Published 18-10-2024
“…Large language models of high parameter counts are computationally expensive, yet can be made much more efficient by compressing their weights to very low…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
8
Self-Selected Attention Span for Accelerating Large Language Model Inference by Jin, Tian, Yazar, Wanzin, Xu, Zifei, Sharify, Sayeh, Wang, Xin

Published 14-04-2024
“…Large language models (LLMs) can solve challenging tasks. However, their inference computation on modern GPUs is highly inefficient due to the increasing…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
9
Mixed-Precision Quantization with Cross-Layer Dependencies by Deng, Zihao, Wang, Xin, Sharify, Sayeh, Orshansky, Michael

Published 11-07-2023
“…Quantization is commonly used to compress and accelerate deep neural networks. Quantization assigning the same bit-width to all layers leads to large accuracy…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
10
Scaling laws for post-training quantized large language models by Xu, Zifei, Lan, Alexander, Yazar, Wanzin, Webb, Tristan, Sharify, Sayeh, Wang, Xin

Published 15-10-2024
“…Generalization abilities of well-trained large language models (LLMs) are known to scale predictably as a function of model size. In contrast to the existence…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
11
Post Training Quantization of Large Language Models with Microscaling Formats by Sharify, Sayeh, Saxena, Utkarsh, Xu, Zifei, Yazar, Wanzin, Soloveychik, Ilya, Wang, Xin

Published 11-05-2024
“…Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
12
Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability by Delmas, Alberto, Sharify, Sayeh, Judd, Patrick, Moshovos, Andreas

Published 27-07-2017
“…Tartan (TRT), a hardware accelerator for inference with Deep Neural Networks (DNNs), is presented and evaluated on Convolutional Neural Networks. TRT exploits…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
13
Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks by Delmas, Alberto, Judd, Patrick, Sharify, Sayeh, Moshovos, Andreas

Published 01-06-2017
“…Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial computation to offer performance that is proportional to the fixed-point precision of…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
14
Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing by Judd, Patrick, Delmas, Alberto, Sharify, Sayeh, Moshovos, Andreas

Published 28-04-2017
“…We discuss several modifications and extensions over the previous proposed Cnvlutin (CNV) accelerator for convolutional and fully-connected layers of Deep…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
15
Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks by Sharify, Sayeh, Lascorz, Alberto Delmas, Siu, Kevin, Judd, Patrick, Moshovos, Andreas

Published in 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) (01-06-2018)
“…Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented. In LM every bit of data precision that can be saved…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
16
Laconic Deep Learning Computing by Sharify, Sayeh, Mahmoud, Mostafa, Lascorz, Alberto Delmas, Nikolic, Milos, Moshovos, Andreas

Published 10-05-2018
“…We motivate a method for transparently identifying ineffectual computations in unmodified Deep Learning models and without affecting accuracy. Specifically, we…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
17
DPRed: Making Typical Activation and Weight Values Matter In Deep Learning Computing by Delmas, Alberto, Sharify, Sayeh, Judd, Patrick, Siu, Kevin, Nikolic, Milos, Moshovos, Andreas

Published 16-04-2018
“…We show that selecting a single data type (precision) for all values in Deep Neural Networks, even if that data type is different per layer, amounts to worst…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
18
Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks by Sharify, Sayeh, Lascorz, Alberto Delmas, Siu, Kevin, Judd, Patrick, Moshovos, Andreas

Published 23-06-2017
“…Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented. In LM every bit of data precision that can be saved…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
19
Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning by Moshovos, Andreas, Albericio, Jorge, Judd, Patrick, Delmas, Alberto, Sharify, Sayeh, Mahmoud, Mostafa, Hetherington, Tayler, Nikolic, Milos, Stuart, Dylan Malone, Siu, Kevin, Poulos, Zissis, Aamodt, Tor, Jerger, Natalie Enright

Published in 2018 16th IEEE International New Circuits and Systems Conference (NEWCAS) (01-06-2018)
“…This article summarizes somde of our work on hardware accelerators for inference with Deep Learning Neural Networks (DNNs). Early success in hardware…”

Get full text

Conference Proceeding
QR Code
Save to List

Saved in:
20
Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How by Delmas, Alberto, Judd, Patrick, Stuart, Dylan Malone, Poulos, Zissis, Mahmoud, Mostafa, Sharify, Sayeh, Nikolic, Milos, Moshovos, Andreas

Published 09-03-2018
“…We show that, during inference with Convolutional Neural Networks (CNNs), more than 2x to $8x ineffectual work can be exposed if instead of targeting those…”

Get full text

Journal Article
QR Code
Save to List

Saved in:

Search Results - "Sharify, Sayeh"

Exploiting Typical Values to Accelerate Deep Learning by Moshovos, Andreas, Albericio, Jorge, Judd, Patrick, Lascorz, Alberto Delmas, Sharify, Sayeh, Poulos, Zissis, Hetherington, Tayler, Aamodt, Tor, Jerger, Natalie Enright

Accelerating Image-Sensor-Based Deep Learning Applications by Mahmoud, Mostafa, Stuart, Dylan Malone, Poulos, Zissis, Lascorz, Alberto Delmas, Judd, Patrick, Sharify, Sayeh, Nikolic, Milos, Siu, Kevin, Vivancos, Isak Edo, Albericio, Jorge, Moshovos, Andreas

Value-Based Deep-Learning Acceleration by Moshovos, Andreas, Albericio, Jorge, Judd, Patrick, Delmas Lascorz, Alberto, Sharify, Sayeh, Hetherington, Tayler, Aamodt, Tor, Enright Jerger, Natalie

Bit-pragmatic deep neural network computing by Albericio, Jorge, Delmás, Alberto, Judd, Patrick, Sharify, Sayeh, O'Leary, Gerard, Genov, Roman, Moshovos, Andreas

Laconic Deep Learning Inference Acceleration by Sharify, Sayeh, Lascorz, Alberto Delmas, Mahmoud, Mostafa, Nikolic, Milos, Siu, Kevin, Stuart, Dylan Malone, Poulos, Zissis, Moshovos, Andreas

Late Breaking Results: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick by Vivancos, Isak Edo, Sharify, Sayeh, Nikolic, Milos, Bannon, Ciaran, Mahmoud, Mostafa, Lascorz, Alberto Delmas, Moshovos, Andreas

Understanding the difficulty of low-precision post-training quantization of large language models by Xu, Zifei, Sharify, Sayeh, Yazar, Wanzin, Webb, Tristan, Wang, Xin

Self-Selected Attention Span for Accelerating Large Language Model Inference by Jin, Tian, Yazar, Wanzin, Xu, Zifei, Sharify, Sayeh, Wang, Xin

Mixed-Precision Quantization with Cross-Layer Dependencies by Deng, Zihao, Wang, Xin, Sharify, Sayeh, Orshansky, Michael

Scaling laws for post-training quantized large language models by Xu, Zifei, Lan, Alexander, Yazar, Wanzin, Webb, Tristan, Sharify, Sayeh, Wang, Xin

Post Training Quantization of Large Language Models with Microscaling Formats by Sharify, Sayeh, Saxena, Utkarsh, Xu, Zifei, Yazar, Wanzin, Soloveychik, Ilya, Wang, Xin

Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability by Delmas, Alberto, Sharify, Sayeh, Judd, Patrick, Moshovos, Andreas

Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks by Delmas, Alberto, Judd, Patrick, Sharify, Sayeh, Moshovos, Andreas

Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing by Judd, Patrick, Delmas, Alberto, Sharify, Sayeh, Moshovos, Andreas

Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks by Sharify, Sayeh, Lascorz, Alberto Delmas, Siu, Kevin, Judd, Patrick, Moshovos, Andreas

Laconic Deep Learning Computing by Sharify, Sayeh, Mahmoud, Mostafa, Lascorz, Alberto Delmas, Nikolic, Milos, Moshovos, Andreas

DPRed: Making Typical Activation and Weight Values Matter In Deep Learning Computing by Delmas, Alberto, Sharify, Sayeh, Judd, Patrick, Siu, Kevin, Nikolic, Milos, Moshovos, Andreas

Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks by Sharify, Sayeh, Lascorz, Alberto Delmas, Siu, Kevin, Judd, Patrick, Moshovos, Andreas

Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How by Delmas, Alberto, Judd, Patrick, Stuart, Dylan Malone, Poulos, Zissis, Mahmoud, Mostafa, Sharify, Sayeh, Nikolic, Milos, Moshovos, Andreas

Search Tools:

Refine Results

Format

Subject Area

Topic

Language

Year of Publication