Search Results - "Ryabinin, Max"
-
1
Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy
Published 13-10-2023“…Text-to-image synthesis has recently attracted widespread attention due to rapidly improving quality and numerous practical applications. However, the language…”
Get full text
Journal Article -
2
It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning
Published 22-06-2021“…Commonsense reasoning is one of the key problems in natural language processing, but the relative scarcity of labeled data holds back the progress for…”
Get full text
Journal Article -
3
Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements
Published 12-01-2024“…Large language models demonstrate a remarkable capability for learning to solve new tasks from a few examples. The prompt template, or the way the input…”
Get full text
Journal Article -
4
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Published 10-02-2020“…Advances in Neural Information Processing Systems 33 (2020) 3659-3672 Many recent breakthroughs in deep learning were achieved by training increasingly larger…”
Get full text
Journal Article -
5
Multilingual Pretraining Using a Large Corpus Machine-Translated from a Single Source Language
Published 31-10-2024“…English, as a very high-resource language, enables the pretraining of high-quality large language models (LLMs). The same cannot be said for most other…”
Get full text
Journal Article -
6
Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics
Published 09-02-2023“…Text-to-image generation models represent the next step of evolution in image synthesis, offering a natural way to achieve flexible yet fine-grained control…”
Get full text
Journal Article -
7
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Published 27-01-2023“…Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for…”
Get full text
Journal Article -
8
Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets
Published 14-05-2021“…Ensembles of machine learning models yield improved system performance as well as robust and interpretable uncertainty estimates; however, their inference…”
Get full text
Journal Article -
9
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
Published 04-06-2024“…As large language models gain widespread adoption, running them efficiently becomes crucial. Recent works on LLM inference use speculative decoding to achieve…”
Get full text
Journal Article -
10
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Published 19-02-2024“…As the usage of large language models (LLMs) grows, performing efficient inference with these models becomes increasingly important. While speculative decoding…”
Get full text
Journal Article -
11
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Published 13-12-2023“…Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion…”
Get full text
Journal Article -
12
Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees
Published 07-10-2021“…https://proceedings.neurips.cc/paper_files/paper/2022/hash/5ac1428c23b5da5e66d029646ea3206d-Abstract-Conference.html Variational inequalities in general and…”
Get full text
Journal Article -
13
Secure Distributed Training at Scale
Published 21-06-2021“…Many areas of deep learning benefit from using increasingly larger neural networks trained on public data, as is the case for pre-trained models for NLP and…”
Get full text
Journal Article -
14
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
Published 04-03-2021“…Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes. This approach, known as distributed training, can…”
Get full text
Journal Article -
15
RuCoLA: Russian Corpus of Linguistic Acceptability
Published 23-10-2022“…Linguistic acceptability (LA) attracts the attention of the research community due to its many uses, such as testing the grammatical knowledge of language…”
Get full text
Journal Article -
16
Petals: Collaborative Inference and Fine-tuning of Large Models
Published 02-09-2022“…Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters. With the release of BLOOM-176B and OPT-175B,…”
Get full text
Journal Article -
17
Training Transformers Together
Published 07-07-2022“…The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large…”
Get full text
Journal Article -
18
The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models
Published 08-04-2024“…Large Language Models (LLMs) have transformed the Natural Language Processing (NLP) landscape with their remarkable ability to understand and generate…”
Get full text
Journal Article -
19
Embedding Words in Non-Vector Space with Unsupervised Graph Learning
Published 06-10-2020“…It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for…”
Get full text
Journal Article -
20
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Published 13-03-2023“…The high computational and memory requirements of large language model (LLM) inference make it feasible only with multiple high-end accelerators. Motivated by…”
Get full text
Journal Article