Search Results - "Dettmers, Tim"
-
1
Accessible Foundation Models: Systems, Algorithms, and Science
Published 01-01-2024“…The ever-increasing scale of foundation models, such as ChatGPT and AlphaFold, has revolutionized AI and science more generally. However, increasing scale also…”
Get full text
Dissertation -
2
The case for 4-bit precision: k-bit Inference Scaling Laws
Published 19-12-2022“…Quantization methods reduce the number of bits required to represent each parameter in a model, trading accuracy for smaller memory footprints and inference…”
Get full text
Journal Article -
3
Sparse Networks from Scratch: Faster Training without Losing Performance
Published 10-07-2019“…We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training…”
Get full text
Journal Article -
4
8-Bit Approximations for Parallelism in Deep Learning
Published 14-11-2015“…The creation of practical deep learning data-products often requires parallelization across processors and computers to make deep learning feasible on large…”
Get full text
Journal Article -
5
QLoRA: Efficient Finetuning of Quantized LLMs
Published 23-05-2023“…We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving…”
Get full text
Journal Article -
6
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Published 27-01-2023“…Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for…”
Get full text
Journal Article -
7
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Published 15-08-2022“…Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for…”
Get full text
Journal Article -
8
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Published 13-12-2023“…Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion…”
Get full text
Journal Article -
9
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
Published 09-07-2024“…Scaling laws with respect to the amount of training data and the number of parameters allow us to predict the cost-benefit trade-offs of pretraining language…”
Get full text
Journal Article -
10
8-bit Optimizers via Block-wise Quantization
Published 06-10-2021“…Stateful optimizers maintain gradient statistics over time, e.g., the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past gradient…”
Get full text
Journal Article -
11
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model
Published 23-05-2023“…Large and sparse feed-forward layers (S-FFN) such as Mixture-of-Experts (MoE) have proven effective in scaling up Transformers model size for…”
Get full text
Journal Article -
12
Stable and low-precision training for large-scale vision-language models
Published 25-04-2023“…We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. 1) For acceleration, we introduce SwitchBack, a…”
Get full text
Journal Article -
13
BASE Layers: Simplifying Training of Large, Sparse Models
Published 30-03-2021“…We introduce a new balanced assignment of experts (BASE) layer for large language models that greatly simplifies existing high capacity sparse layers. Sparse…”
Get full text
Journal Article -
14
Petals: Collaborative Inference and Fine-tuning of Large Models
Published 02-09-2022“…Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters. With the release of BLOOM-176B and OPT-175B,…”
Get full text
Journal Article -
15
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models
Published 05-08-2022“…We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel training of large language models (LLMs). We show it is…”
Get full text
Journal Article -
16
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Published 05-06-2023“…Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities. By compressing such LLMs via quantization to…”
Get full text
Journal Article -
17
Training Transformers Together
Published 07-07-2022“…The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large…”
Get full text
Journal Article -
18
MatFormer: Nested Transformer for Elastic Inference
Published 11-10-2023“…Transformer models are deployed in a wide range of settings, from multi-accelerator clusters to standalone mobile phones. The diverse inference constraints in…”
Get full text
Journal Article -
19
OLMoE: Open Mixture-of-Experts Language Models
Published 03-09-2024“…We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses…”
Get full text
Journal Article -
20
Convolutional 2D Knowledge Graph Embeddings
Published 05-07-2017“…Link prediction for knowledge graphs is the task of predicting missing relationships between entities. Previous work on link prediction has focused on shallow,…”
Get full text
Journal Article