Search Results - "Bekman, Stas"

  • Showing 1 - 5 results of 5
Refine Results
  1. 1

    Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training by Lian, Xinyu, Jacobs, Sam Ade, Kurilenko, Lev, Tanaka, Masahiro, Bekman, Stas, Ruwase, Olatunji, Zhang, Minjia

    Published 26-06-2024
    “…Existing checkpointing approaches seem ill-suited for distributed training even though hardware limitations make model parallelism, i.e., sharding model state…”
    Get full text
    Journal Article
  2. 2

    The Case for Co-Designing Model Architectures with Hardware by Anthony, Quentin, Hatef, Jacob, Narayanan, Deepak, Biderman, Stella, Bekman, Stas, Yin, Junqi, Shafi, Aamir, Subramoni, Hari, Panda, Dhabaleswar

    Published 25-01-2024
    “…While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked…”
    Get full text
    Journal Article
  3. 3

    OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Laurençon, Hugo, Saulnier, Lucile, Tronchon, Léo, Bekman, Stas, Singh, Amanpreet, Lozhkov, Anton, Wang, Thomas, Karamcheti, Siddharth, Rush, Alexander M, Kiela, Douwe, Cord, Matthieu, Sanh, Victor

    Published 21-06-2023
    “…Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal…”
    Get full text
    Journal Article
  4. 4

    What Language Model to Train if You Have One Million GPU Hours? by Scao, Teven Le, Wang, Thomas, Hesslow, Daniel, Saulnier, Lucile, Bekman, Stas, Bari, M Saiful, Biderman, Stella, Elsahar, Hady, Muennighoff, Niklas, Phang, Jason, Press, Ofir, Raffel, Colin, Sanh, Victor, Shen, Sheng, Sutawika, Lintang, Tae, Jaesung, Yong, Zheng Xin, Launay, Julien, Beltagy, Iz

    Published 27-10-2022
    “…The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations…”
    Get full text
    Journal Article
  5. 5