Search Results - "Raffel, Colin"
-
1
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP
Published in Transactions of the Association for Computational Linguistics (14-03-2023)“…NLP has achieved great progress in the past decade through the use of neural models and large labeled datasets. The dependence on abundant data prevents NLP…”
Get full text
Journal Article -
2
Efficient Methods for Natural Language Processing: A Survey
Published in Transactions of the Association for Computational Linguistics (12-07-2023)“…Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to…”
Get full text
Journal Article -
3
ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models
Published in Transactions of the Association for Computational Linguistics (25-03-2022)“…Most widely used pre-trained language models operate on sequences of tokens corresponding to word or subword units. By comparison, models that operate directly…”
Get full text
Journal Article -
4
Learning Hard Alignments with Variational Inference
Published in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (01-04-2018)“…There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard…”
Get full text
Conference Proceeding -
5
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
Published 02-01-2024“…While large language models have proven effective in a huge range of downstream applications, they often generate text that is problematic or lacks a desired…”
Get full text
Journal Article -
6
Optimizing DTW-based audio-to-MIDI alignment and matching
Published in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (01-03-2016)“…Dynamic time warping (DTW) has proven to be an extremely effective method for both aligning and matching recordings of music to corresponding MIDI…”
Get full text
Conference Proceeding Journal Article -
7
NPEFF: Non-Negative Per-Example Fisher Factorization
Published 06-10-2023“…As deep learning models are deployed in more and more settings, it becomes increasingly important to be able to understand why they produce a given prediction,…”
Get full text
Journal Article -
8
Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching
Published 01-01-2016“…Sequences of feature vectors are a natural way of representing temporal data. Given a database of sequences, a fundamental task is to find the database entry…”
Get full text
Dissertation -
9
A Combinatorial Perspective on the Optimization of Shallow ReLU Networks
Published 30-09-2022“…The NP-hard problem of optimizing a shallow ReLU network can be characterized as a combinatorial search over each training example's activation pattern…”
Get full text
Journal Article -
10
Merging Models with Fisher-Weighted Averaging
Published 18-11-2021“…Averaging the parameters of models that have the same architecture and initialization can provide a means of combining their respective capabilities. In this…”
Get full text
Journal Article -
11
Hickle: A HDF5-based python pickle replacement
Published in Journal of open source software (17-12-2018)Get full text
Journal Article -
12
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Published 15-02-2024“…Large language models (LLMs) have become a dominant and important tool for NLP researchers in a wide range of tasks. Today, many researchers use LLMs in…”
Get full text
Journal Article -
13
Pruning subsequence search with attention-based embedding
Published in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (01-03-2016)“…Searching a large database to find a sequence that is most similar to a query can be prohibitively expensive, particularly if individual sequence comparisons…”
Get full text
Conference Proceeding Journal Article -
14
Merging by Matching Models in Task Parameter Subspaces
Published 07-12-2023“…Model merging aims to cheaply combine individual task-specific models into a single multitask model. In this work, we view past merging methods as leveraging…”
Get full text
Journal Article -
15
Soft Merging of Experts with Adaptive Routing
Published 06-06-2023“…Sparsely activated neural networks with conditional computation learn to route their inputs through different "expert" subnetworks, providing a form of…”
Get full text
Journal Article -
16
Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data
Published 01-02-2023“…Few-shot learning is valuable in many real-world applications, but learning a generalizable model without overfitting to the few labeled datapoints is…”
Get full text
Journal Article -
17
Realistic Evaluation of Model Merging for Compositional Generalization
Published 26-09-2024“…Merging has become a widespread way to cheaply combine individual models into a single model that inherits their capabilities and attains better performance…”
Get full text
Journal Article -
18
Estimating timing and channel distortion across related signals
Published in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (01-05-2014)“…We consider the situation where there are multiple audio signals whose relationship is of interest. If these signals have been differently captured, the…”
Get full text
Conference Proceeding -
19
Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language
Published 02-10-2022“…Deep learning models struggle with compositional generalization, i.e. the ability to recognize or generate novel combinations of observed elementary concepts…”
Get full text
Journal Article -
20
Learning to Route Among Specialized Experts for Zero-Shot Generalization
Published 08-02-2024“…Recently, there has been a widespread proliferation of "expert" language models that are specialized to a specific task or domain through parameter-efficient…”
Get full text
Journal Article