Search Results - "Bekman, Stas"
-
1
Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training
Published 26-06-2024“…Existing checkpointing approaches seem ill-suited for distributed training even though hardware limitations make model parallelism, i.e., sharding model state…”
Get full text
Journal Article -
2
The Case for Co-Designing Model Architectures with Hardware
Published 25-01-2024“…While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked…”
Get full text
Journal Article -
3
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Published 21-06-2023“…Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal…”
Get full text
Journal Article -
4
What Language Model to Train if You Have One Million GPU Hours?
Published 27-10-2022“…The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations…”
Get full text
Journal Article -
5
Datasets: A Community Library for Natural Language Processing
Published 06-09-2021“…The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks…”
Get full text
Journal Article