Search Results - "Bekman, Stas" :: Katalog Arama

1
Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training by Lian, Xinyu, Jacobs, Sam Ade, Kurilenko, Lev, Tanaka, Masahiro, Bekman, Stas, Ruwase, Olatunji, Zhang, Minjia

Published 26-06-2024
“…Existing checkpointing approaches seem ill-suited for distributed training even though hardware limitations make model parallelism, i.e., sharding model state…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
2
The Case for Co-Designing Model Architectures with Hardware by Anthony, Quentin, Hatef, Jacob, Narayanan, Deepak, Biderman, Stella, Bekman, Stas, Yin, Junqi, Shafi, Aamir, Subramoni, Hari, Panda, Dhabaleswar

Published 25-01-2024
“…While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
3
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Laurençon, Hugo, Saulnier, Lucile, Tronchon, Léo, Bekman, Stas, Singh, Amanpreet, Lozhkov, Anton, Wang, Thomas, Karamcheti, Siddharth, Rush, Alexander M, Kiela, Douwe, Cord, Matthieu, Sanh, Victor

Published 21-06-2023
“…Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
4
What Language Model to Train if You Have One Million GPU Hours? by Scao, Teven Le, Wang, Thomas, Hesslow, Daniel, Saulnier, Lucile, Bekman, Stas, Bari, M Saiful, Biderman, Stella, Elsahar, Hady, Muennighoff, Niklas, Phang, Jason, Press, Ofir, Raffel, Colin, Sanh, Victor, Shen, Sheng, Sutawika, Lintang, Tae, Jaesung, Yong, Zheng Xin, Launay, Julien, Beltagy, Iz

Published 27-10-2022
“…The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
5
Datasets: A Community Library for Natural Language Processing by Lhoest, Quentin, del Moral, Albert Villanova, Jernite, Yacine, Thakur, Abhishek, von Platen, Patrick, Patil, Suraj, Chaumond, Julien, Drame, Mariama, Plu, Julien, Tunstall, Lewis, Davison, Joe, Šaško, Mario, Chhablani, Gunjan, Malik, Bhavitvya, Brandeis, Simon, Scao, Teven Le, Sanh, Victor, Xu, Canwen, Patry, Nicolas, McMillan-Major, Angelina, Schmid, Philipp, Gugger, Sylvain, Delangue, Clément, Matussière, Théo, Debut, Lysandre, Bekman, Stas, Cistac, Pierric, Goehringer, Thibault, Mustar, Victor, Lagunas, François, Rush, Alexander M, Wolf, Thomas

Published 06-09-2021
“…The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks…”

Get full text

Journal Article
QR Code
Save to List

Saved in: