Search Results - "Dettmers, Tim"

1
Accessible Foundation Models: Systems, Algorithms, and Science by Dettmers, Tim

Published 01-01-2024
“…The ever-increasing scale of foundation models, such as ChatGPT and AlphaFold, has revolutionized AI and science more generally. However, increasing scale also…”

Get full text

Dissertation
QR Code
Save to List

Saved in:
2
The case for 4-bit precision: k-bit Inference Scaling Laws by Dettmers, Tim, Zettlemoyer, Luke

Published 19-12-2022
“…Quantization methods reduce the number of bits required to represent each parameter in a model, trading accuracy for smaller memory footprints and inference…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
3
Sparse Networks from Scratch: Faster Training without Losing Performance by Dettmers, Tim, Zettlemoyer, Luke

Published 10-07-2019
“…We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
4
8-Bit Approximations for Parallelism in Deep Learning by Dettmers, Tim

Published 14-11-2015
“…The creation of practical deep learning data-products often requires parallelization across processors and computers to make deep learning feasible on large…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
5
QLoRA: Efficient Finetuning of Quantized LLMs by Dettmers, Tim, Pagnoni, Artidoro, Holtzman, Ari, Zettlemoyer, Luke

Published 23-05-2023
“…We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
6
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient by Ryabinin, Max, Dettmers, Tim, Diskin, Michael, Borzunov, Alexander

Published 27-01-2023
“…Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
7
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale by Dettmers, Tim, Lewis, Mike, Belkada, Younes, Zettlemoyer, Luke

Published 15-08-2022
“…Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
8
Distributed Inference and Fine-tuning of Large Language Models Over The Internet by Borzunov, Alexander, Ryabinin, Max, Chumachenko, Artem, Baranchuk, Dmitry, Dettmers, Tim, Belkada, Younes, Samygin, Pavel, Raffel, Colin

Published 13-12-2023
“…Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
9
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore by Shao, Rulin, He, Jacqueline, Asai, Akari, Shi, Weijia, Dettmers, Tim, Min, Sewon, Zettlemoyer, Luke, Koh, Pang Wei

Published 09-07-2024
“…Scaling laws with respect to the amount of training data and the number of parameters allow us to predict the cost-benefit trade-offs of pretraining language…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
10
8-bit Optimizers via Block-wise Quantization by Dettmers, Tim, Lewis, Mike, Shleifer, Sam, Zettlemoyer, Luke

Published 06-10-2021
“…Stateful optimizers maintain gradient statistics over time, e.g., the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past gradient…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
11
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model by Liu, Zeyu Leo, Dettmers, Tim, Lin, Xi Victoria, Stoyanov, Veselin, Li, Xian

Published 23-05-2023
“…Large and sparse feed-forward layers (S-FFN) such as Mixture-of-Experts (MoE) have proven effective in scaling up Transformers model size for…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
12
Stable and low-precision training for large-scale vision-language models by Wortsman, Mitchell, Dettmers, Tim, Zettlemoyer, Luke, Morcos, Ari, Farhadi, Ali, Schmidt, Ludwig

Published 25-04-2023
“…We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. 1) For acceleration, we introduce SwitchBack, a…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
13
BASE Layers: Simplifying Training of Large, Sparse Models by Lewis, Mike, Bhosale, Shruti, Dettmers, Tim, Goyal, Naman, Zettlemoyer, Luke

Published 30-03-2021
“…We introduce a new balanced assignment of experts (BASE) layer for large language models that greatly simplifies existing high capacity sparse layers. Sparse…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
14
Petals: Collaborative Inference and Fine-tuning of Large Models by Borzunov, Alexander, Baranchuk, Dmitry, Dettmers, Tim, Ryabinin, Max, Belkada, Younes, Chumachenko, Artem, Samygin, Pavel, Raffel, Colin

Published 02-09-2022
“…Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters. With the release of BLOOM-176B and OPT-175B,…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
15
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models by Li, Margaret, Gururangan, Suchin, Dettmers, Tim, Lewis, Mike, Althoff, Tim, Smith, Noah A, Zettlemoyer, Luke

Published 05-08-2022
“…We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel training of large language models (LLMs). We show it is…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
16
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression by Dettmers, Tim, Svirschevski, Ruslan, Egiazarian, Vage, Kuznedelev, Denis, Frantar, Elias, Ashkboos, Saleh, Borzunov, Alexander, Hoefler, Torsten, Alistarh, Dan

Published 05-06-2023
“…Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities. By compressing such LLMs via quantization to…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
17
Training Transformers Together by Borzunov, Alexander, Ryabinin, Max, Dettmers, Tim, Lhoest, Quentin, Saulnier, Lucile, Diskin, Michael, Jernite, Yacine, Wolf, Thomas

Published 07-07-2022
“…The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
18
MatFormer: Nested Transformer for Elastic Inference by Devvrit, Kudugunta, Sneha, Kusupati, Aditya, Dettmers, Tim, Chen, Kaifeng, Dhillon, Inderjit, Tsvetkov, Yulia, Hajishirzi, Hannaneh, Kakade, Sham, Farhadi, Ali, Jain, Prateek

Published 11-10-2023
“…Transformer models are deployed in a wide range of settings, from multi-accelerator clusters to standalone mobile phones. The diverse inference constraints in…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
19
OLMoE: Open Mixture-of-Experts Language Models by Muennighoff, Niklas, Soldaini, Luca, Groeneveld, Dirk, Lo, Kyle, Morrison, Jacob, Min, Sewon, Shi, Weijia, Walsh, Pete, Tafjord, Oyvind, Lambert, Nathan, Gu, Yuling, Arora, Shane, Bhagia, Akshita, Schwenk, Dustin, Wadden, David, Wettig, Alexander, Hui, Binyuan, Dettmers, Tim, Kiela, Douwe, Farhadi, Ali, Smith, Noah A, Koh, Pang Wei, Singh, Amanpreet, Hajishirzi, Hannaneh

Published 03-09-2024
“…We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
20
Convolutional 2D Knowledge Graph Embeddings by Dettmers, Tim, Minervini, Pasquale, Stenetorp, Pontus, Riedel, Sebastian

Published 05-07-2017
“…Link prediction for knowledge graphs is the task of predicting missing relationships between entities. Previous work on link prediction has focused on shallow,…”

Get full text

Journal Article
QR Code
Save to List

Saved in:

Search Results - "Dettmers, Tim"

Accessible Foundation Models: Systems, Algorithms, and Science by Dettmers, Tim

The case for 4-bit precision: k-bit Inference Scaling Laws by Dettmers, Tim, Zettlemoyer, Luke

Sparse Networks from Scratch: Faster Training without Losing Performance by Dettmers, Tim, Zettlemoyer, Luke

8-Bit Approximations for Parallelism in Deep Learning by Dettmers, Tim

QLoRA: Efficient Finetuning of Quantized LLMs by Dettmers, Tim, Pagnoni, Artidoro, Holtzman, Ari, Zettlemoyer, Luke

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient by Ryabinin, Max, Dettmers, Tim, Diskin, Michael, Borzunov, Alexander

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale by Dettmers, Tim, Lewis, Mike, Belkada, Younes, Zettlemoyer, Luke

Distributed Inference and Fine-tuning of Large Language Models Over The Internet by Borzunov, Alexander, Ryabinin, Max, Chumachenko, Artem, Baranchuk, Dmitry, Dettmers, Tim, Belkada, Younes, Samygin, Pavel, Raffel, Colin

Scaling Retrieval-Based Language Models with a Trillion-Token Datastore by Shao, Rulin, He, Jacqueline, Asai, Akari, Shi, Weijia, Dettmers, Tim, Min, Sewon, Zettlemoyer, Luke, Koh, Pang Wei

8-bit Optimizers via Block-wise Quantization by Dettmers, Tim, Lewis, Mike, Shleifer, Sam, Zettlemoyer, Luke

Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model by Liu, Zeyu Leo, Dettmers, Tim, Lin, Xi Victoria, Stoyanov, Veselin, Li, Xian

Stable and low-precision training for large-scale vision-language models by Wortsman, Mitchell, Dettmers, Tim, Zettlemoyer, Luke, Morcos, Ari, Farhadi, Ali, Schmidt, Ludwig

BASE Layers: Simplifying Training of Large, Sparse Models by Lewis, Mike, Bhosale, Shruti, Dettmers, Tim, Goyal, Naman, Zettlemoyer, Luke

Petals: Collaborative Inference and Fine-tuning of Large Models by Borzunov, Alexander, Baranchuk, Dmitry, Dettmers, Tim, Ryabinin, Max, Belkada, Younes, Chumachenko, Artem, Samygin, Pavel, Raffel, Colin

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models by Li, Margaret, Gururangan, Suchin, Dettmers, Tim, Lewis, Mike, Althoff, Tim, Smith, Noah A, Zettlemoyer, Luke

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression by Dettmers, Tim, Svirschevski, Ruslan, Egiazarian, Vage, Kuznedelev, Denis, Frantar, Elias, Ashkboos, Saleh, Borzunov, Alexander, Hoefler, Torsten, Alistarh, Dan

Training Transformers Together by Borzunov, Alexander, Ryabinin, Max, Dettmers, Tim, Lhoest, Quentin, Saulnier, Lucile, Diskin, Michael, Jernite, Yacine, Wolf, Thomas

MatFormer: Nested Transformer for Elastic Inference by Devvrit, Kudugunta, Sneha, Kusupati, Aditya, Dettmers, Tim, Chen, Kaifeng, Dhillon, Inderjit, Tsvetkov, Yulia, Hajishirzi, Hannaneh, Kakade, Sham, Farhadi, Ali, Jain, Prateek

Convolutional 2D Knowledge Graph Embeddings by Dettmers, Tim, Minervini, Pasquale, Stenetorp, Pontus, Riedel, Sebastian

Search Tools:

Refine Results

Format

Subject Area

Topic

Language

Year of Publication