Search Results - "Sharify, Sayeh"

  • Showing 1 - 20 results of 20
Refine Results
  1. 1

    Exploiting Typical Values to Accelerate Deep Learning by Moshovos, Andreas, Albericio, Jorge, Judd, Patrick, Lascorz, Alberto Delmas, Sharify, Sayeh, Poulos, Zissis, Hetherington, Tayler, Aamodt, Tor, Jerger, Natalie Enright

    Published in Computer (Long Beach, Calif.) (01-05-2018)
    “…To deliver the hardware computation power advances needed to support deep learning innovations, identifying deep learning properties that designers could…”
    Get full text
    Journal Article
  2. 2

    Accelerating Image-Sensor-Based Deep Learning Applications by Mahmoud, Mostafa, Stuart, Dylan Malone, Poulos, Zissis, Lascorz, Alberto Delmas, Judd, Patrick, Sharify, Sayeh, Nikolic, Milos, Siu, Kevin, Vivancos, Isak Edo, Albericio, Jorge, Moshovos, Andreas

    Published in IEEE MICRO (01-09-2019)
    “…We review two inference accelerators that exploit value properties in deep neural networks: 1) Diffy that targets spatially correlated activations in…”
    Get full text
    Journal Article
  3. 3

    Value-Based Deep-Learning Acceleration by Moshovos, Andreas, Albericio, Jorge, Judd, Patrick, Delmas Lascorz, Alberto, Sharify, Sayeh, Hetherington, Tayler, Aamodt, Tor, Enright Jerger, Natalie

    Published in IEEE MICRO (01-01-2018)
    “…This article summarizes our recent work on value-based hardware accelerators for image classification using Deep Convolutional Neural Networks (CNNs). The…”
    Get full text
    Journal Article
  4. 4

    Bit-pragmatic deep neural network computing by Albericio, Jorge, Delmás, Alberto, Judd, Patrick, Sharify, Sayeh, O'Leary, Gerard, Genov, Roman, Moshovos, Andreas

    “…Deep Neural Networks expose a high degree of parallelism, making them amenable to highly data parallel architectures. However, data-parallel architectures…”
    Get full text
    Conference Proceeding
  5. 5

    Laconic Deep Learning Inference Acceleration by Sharify, Sayeh, Lascorz, Alberto Delmas, Mahmoud, Mostafa, Nikolic, Milos, Siu, Kevin, Stuart, Dylan Malone, Poulos, Zissis, Moshovos, Andreas

    “…We present a method for transparently identifying ineffectual computations during inference with Deep Learning models. Specifically, by decomposing…”
    Get full text
    Conference Proceeding
  6. 6

    Late Breaking Results: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick by Vivancos, Isak Edo, Sharify, Sayeh, Nikolic, Milos, Bannon, Ciaran, Mahmoud, Mostafa, Lascorz, Alberto Delmas, Moshovos, Andreas

    “…Data accesses between on- and off-chip memories account for a large fraction of overall energy consumption during inference with deep learning networks. We…”
    Get full text
    Conference Proceeding
  7. 7

    Understanding the difficulty of low-precision post-training quantization of large language models by Xu, Zifei, Sharify, Sayeh, Yazar, Wanzin, Webb, Tristan, Wang, Xin

    Published 18-10-2024
    “…Large language models of high parameter counts are computationally expensive, yet can be made much more efficient by compressing their weights to very low…”
    Get full text
    Journal Article
  8. 8

    Self-Selected Attention Span for Accelerating Large Language Model Inference by Jin, Tian, Yazar, Wanzin, Xu, Zifei, Sharify, Sayeh, Wang, Xin

    Published 14-04-2024
    “…Large language models (LLMs) can solve challenging tasks. However, their inference computation on modern GPUs is highly inefficient due to the increasing…”
    Get full text
    Journal Article
  9. 9

    Mixed-Precision Quantization with Cross-Layer Dependencies by Deng, Zihao, Wang, Xin, Sharify, Sayeh, Orshansky, Michael

    Published 11-07-2023
    “…Quantization is commonly used to compress and accelerate deep neural networks. Quantization assigning the same bit-width to all layers leads to large accuracy…”
    Get full text
    Journal Article
  10. 10

    Scaling laws for post-training quantized large language models by Xu, Zifei, Lan, Alexander, Yazar, Wanzin, Webb, Tristan, Sharify, Sayeh, Wang, Xin

    Published 15-10-2024
    “…Generalization abilities of well-trained large language models (LLMs) are known to scale predictably as a function of model size. In contrast to the existence…”
    Get full text
    Journal Article
  11. 11

    Post Training Quantization of Large Language Models with Microscaling Formats by Sharify, Sayeh, Saxena, Utkarsh, Xu, Zifei, Yazar, Wanzin, Soloveychik, Ilya, Wang, Xin

    Published 11-05-2024
    “…Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant…”
    Get full text
    Journal Article
  12. 12

    Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability by Delmas, Alberto, Sharify, Sayeh, Judd, Patrick, Moshovos, Andreas

    Published 27-07-2017
    “…Tartan (TRT), a hardware accelerator for inference with Deep Neural Networks (DNNs), is presented and evaluated on Convolutional Neural Networks. TRT exploits…”
    Get full text
    Journal Article
  13. 13

    Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks by Delmas, Alberto, Judd, Patrick, Sharify, Sayeh, Moshovos, Andreas

    Published 01-06-2017
    “…Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial computation to offer performance that is proportional to the fixed-point precision of…”
    Get full text
    Journal Article
  14. 14

    Cnvlutin2: Ineffectual-Activation-and-Weight-Free Deep Neural Network Computing by Judd, Patrick, Delmas, Alberto, Sharify, Sayeh, Moshovos, Andreas

    Published 28-04-2017
    “…We discuss several modifications and extensions over the previous proposed Cnvlutin (CNV) accelerator for convolutional and fully-connected layers of Deep…”
    Get full text
    Journal Article
  15. 15

    Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks by Sharify, Sayeh, Lascorz, Alberto Delmas, Siu, Kevin, Judd, Patrick, Moshovos, Andreas

    “…Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented. In LM every bit of data precision that can be saved…”
    Get full text
    Conference Proceeding
  16. 16

    Laconic Deep Learning Computing by Sharify, Sayeh, Mahmoud, Mostafa, Lascorz, Alberto Delmas, Nikolic, Milos, Moshovos, Andreas

    Published 10-05-2018
    “…We motivate a method for transparently identifying ineffectual computations in unmodified Deep Learning models and without affecting accuracy. Specifically, we…”
    Get full text
    Journal Article
  17. 17

    DPRed: Making Typical Activation and Weight Values Matter In Deep Learning Computing by Delmas, Alberto, Sharify, Sayeh, Judd, Patrick, Siu, Kevin, Nikolic, Milos, Moshovos, Andreas

    Published 16-04-2018
    “…We show that selecting a single data type (precision) for all values in Deep Neural Networks, even if that data type is different per layer, amounts to worst…”
    Get full text
    Journal Article
  18. 18

    Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks by Sharify, Sayeh, Lascorz, Alberto Delmas, Siu, Kevin, Judd, Patrick, Moshovos, Andreas

    Published 23-06-2017
    “…Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented. In LM every bit of data precision that can be saved…”
    Get full text
    Journal Article
  19. 19
  20. 20

    Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How by Delmas, Alberto, Judd, Patrick, Stuart, Dylan Malone, Poulos, Zissis, Mahmoud, Mostafa, Sharify, Sayeh, Nikolic, Milos, Moshovos, Andreas

    Published 09-03-2018
    “…We show that, during inference with Convolutional Neural Networks (CNNs), more than 2x to $8x ineffectual work can be exposed if instead of targeting those…”
    Get full text
    Journal Article