Search Results - "Dettmers, Tim"

Refine Results
  1. 1

    Accessible Foundation Models: Systems, Algorithms, and Science by Dettmers, Tim

    Published 01-01-2024
    “…The ever-increasing scale of foundation models, such as ChatGPT and AlphaFold, has revolutionized AI and science more generally. However, increasing scale also…”
    Get full text
    Dissertation
  2. 2

    The case for 4-bit precision: k-bit Inference Scaling Laws by Dettmers, Tim, Zettlemoyer, Luke

    Published 19-12-2022
    “…Quantization methods reduce the number of bits required to represent each parameter in a model, trading accuracy for smaller memory footprints and inference…”
    Get full text
    Journal Article
  3. 3

    Sparse Networks from Scratch: Faster Training without Losing Performance by Dettmers, Tim, Zettlemoyer, Luke

    Published 10-07-2019
    “…We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training…”
    Get full text
    Journal Article
  4. 4

    8-Bit Approximations for Parallelism in Deep Learning by Dettmers, Tim

    Published 14-11-2015
    “…The creation of practical deep learning data-products often requires parallelization across processors and computers to make deep learning feasible on large…”
    Get full text
    Journal Article
  5. 5

    QLoRA: Efficient Finetuning of Quantized LLMs by Dettmers, Tim, Pagnoni, Artidoro, Holtzman, Ari, Zettlemoyer, Luke

    Published 23-05-2023
    “…We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving…”
    Get full text
    Journal Article
  6. 6

    SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient by Ryabinin, Max, Dettmers, Tim, Diskin, Michael, Borzunov, Alexander

    Published 27-01-2023
    “…Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for…”
    Get full text
    Journal Article
  7. 7

    LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale by Dettmers, Tim, Lewis, Mike, Belkada, Younes, Zettlemoyer, Luke

    Published 15-08-2022
    “…Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for…”
    Get full text
    Journal Article
  8. 8

    Distributed Inference and Fine-tuning of Large Language Models Over The Internet by Borzunov, Alexander, Ryabinin, Max, Chumachenko, Artem, Baranchuk, Dmitry, Dettmers, Tim, Belkada, Younes, Samygin, Pavel, Raffel, Colin

    Published 13-12-2023
    “…Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion…”
    Get full text
    Journal Article
  9. 9

    Scaling Retrieval-Based Language Models with a Trillion-Token Datastore by Shao, Rulin, He, Jacqueline, Asai, Akari, Shi, Weijia, Dettmers, Tim, Min, Sewon, Zettlemoyer, Luke, Koh, Pang Wei

    Published 09-07-2024
    “…Scaling laws with respect to the amount of training data and the number of parameters allow us to predict the cost-benefit trade-offs of pretraining language…”
    Get full text
    Journal Article
  10. 10

    8-bit Optimizers via Block-wise Quantization by Dettmers, Tim, Lewis, Mike, Shleifer, Sam, Zettlemoyer, Luke

    Published 06-10-2021
    “…Stateful optimizers maintain gradient statistics over time, e.g., the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past gradient…”
    Get full text
    Journal Article
  11. 11

    Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model by Liu, Zeyu Leo, Dettmers, Tim, Lin, Xi Victoria, Stoyanov, Veselin, Li, Xian

    Published 23-05-2023
    “…Large and sparse feed-forward layers (S-FFN) such as Mixture-of-Experts (MoE) have proven effective in scaling up Transformers model size for…”
    Get full text
    Journal Article
  12. 12

    Stable and low-precision training for large-scale vision-language models by Wortsman, Mitchell, Dettmers, Tim, Zettlemoyer, Luke, Morcos, Ari, Farhadi, Ali, Schmidt, Ludwig

    Published 25-04-2023
    “…We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. 1) For acceleration, we introduce SwitchBack, a…”
    Get full text
    Journal Article
  13. 13

    BASE Layers: Simplifying Training of Large, Sparse Models by Lewis, Mike, Bhosale, Shruti, Dettmers, Tim, Goyal, Naman, Zettlemoyer, Luke

    Published 30-03-2021
    “…We introduce a new balanced assignment of experts (BASE) layer for large language models that greatly simplifies existing high capacity sparse layers. Sparse…”
    Get full text
    Journal Article
  14. 14

    Petals: Collaborative Inference and Fine-tuning of Large Models by Borzunov, Alexander, Baranchuk, Dmitry, Dettmers, Tim, Ryabinin, Max, Belkada, Younes, Chumachenko, Artem, Samygin, Pavel, Raffel, Colin

    Published 02-09-2022
    “…Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters. With the release of BLOOM-176B and OPT-175B,…”
    Get full text
    Journal Article
  15. 15

    Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models by Li, Margaret, Gururangan, Suchin, Dettmers, Tim, Lewis, Mike, Althoff, Tim, Smith, Noah A, Zettlemoyer, Luke

    Published 05-08-2022
    “…We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel training of large language models (LLMs). We show it is…”
    Get full text
    Journal Article
  16. 16

    SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression by Dettmers, Tim, Svirschevski, Ruslan, Egiazarian, Vage, Kuznedelev, Denis, Frantar, Elias, Ashkboos, Saleh, Borzunov, Alexander, Hoefler, Torsten, Alistarh, Dan

    Published 05-06-2023
    “…Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities. By compressing such LLMs via quantization to…”
    Get full text
    Journal Article
  17. 17

    Training Transformers Together by Borzunov, Alexander, Ryabinin, Max, Dettmers, Tim, Lhoest, Quentin, Saulnier, Lucile, Diskin, Michael, Jernite, Yacine, Wolf, Thomas

    Published 07-07-2022
    “…The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large…”
    Get full text
    Journal Article
  18. 18

    MatFormer: Nested Transformer for Elastic Inference by Devvrit, Kudugunta, Sneha, Kusupati, Aditya, Dettmers, Tim, Chen, Kaifeng, Dhillon, Inderjit, Tsvetkov, Yulia, Hajishirzi, Hannaneh, Kakade, Sham, Farhadi, Ali, Jain, Prateek

    Published 11-10-2023
    “…Transformer models are deployed in a wide range of settings, from multi-accelerator clusters to standalone mobile phones. The diverse inference constraints in…”
    Get full text
    Journal Article
  19. 19
  20. 20

    Convolutional 2D Knowledge Graph Embeddings by Dettmers, Tim, Minervini, Pasquale, Stenetorp, Pontus, Riedel, Sebastian

    Published 05-07-2017
    “…Link prediction for knowledge graphs is the task of predicting missing relationships between entities. Previous work on link prediction has focused on shallow,…”
    Get full text
    Journal Article