Search Results - "Thrush, Tristan"
-
1
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Published in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2022)“…We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call…”
Get full text
Conference Proceeding -
2
Compositional Neural Machine Translation by Removing the Lexicon from Syntax
Published 06-02-2020“…The meaning of a natural language utterance is largely determined from its syntax and words. Additionally, there is evidence that humans process an utterance…”
Get full text
Journal Article -
3
Rover Relocalization for Mars Sample Return by Virtual Template Synthesis and Matching
Published in IEEE robotics and automation letters (01-04-2021)“…We consider the problem of rover relocalization in the context of the notional Mars Sample Return campaign. In this campaign, a rover (R1) needs to be capable…”
Get full text
Journal Article -
4
Improving Pretraining Data Using Perplexity Correlations
Published 09-09-2024“…Quality pretraining data is often seen as the key to high-performance language models. However, progress in understanding pretraining data has been slow due to…”
Get full text
Journal Article -
5
ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation
Published 06-02-2024“…This paper introduces the ColorSwap dataset, designed to assess and improve the proficiency of multimodal models in matching objects with their colors. The…”
Get full text
Journal Article -
6
I am a Strange Dataset: Metalinguistic Tests for Language Models
Published 10-01-2024“…Statements involving metalinguistic self-reference ("This paper has six sections.") are prevalent in many domains. Can current large language models (LLMs)…”
Get full text
Journal Article -
7
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
Published 28-06-2023“…We propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a language…”
Get full text
Journal Article -
8
Nearest Neighbor Normalization Improves Multimodal Retrieval
Published 31-10-2024“…EMNLP 2024 Multimodal models leverage large-scale pre-training to achieve strong but still imperfect performance on tasks such as image captioning, visual…”
Get full text
Journal Article -
9
Investigating Novel Verb Learning in BERT: Selectional Preference Classes and Alternation-Based Syntactic Generalization
Published 04-11-2020“…Previous studies investigating the syntactic abilities of deep learning models have not targeted the relationship between the strength of the grammatical…”
Get full text
Journal Article -
10
ANLIzing the Adversarial Natural Language Inference Dataset
Published 23-10-2020“…We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-scale human-and-model-in-the-loop natural language inference…”
Get full text
Journal Article -
11
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
Published 31-12-2020“…We present a human-and-model-in-the-loop process for dynamically generating datasets and training better performing and more robust hate detection models. We…”
Get full text
Journal Article -
12
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Published 06-04-2022“…We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call…”
Get full text
Journal Article -
13
Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation
Published 15-03-2022“…Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, p.8830-8848. Association for Computational Linguistics Despite recent…”
Get full text
Journal Article -
14
Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants
Published 16-12-2021“…In Dynamic Adversarial Data Collection (DADC), human annotators are tasked with finding examples that models struggle to predict correctly. Models trained on…”
Get full text
Journal Article -
15
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate
Published 12-08-2021“…2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022) Detecting online hate is a complex task, and…”
Get full text
Journal Article -
16
Measuring Data
Published 09-12-2022“…We identify the task of measuring data to quantitatively characterize the composition of machine learning data and datasets. Similar to an object's height,…”
Get full text
Journal Article -
17
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking
Published 20-05-2021“…We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench…”
Get full text
Journal Article -
18
Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks
Published 04-04-2022“…We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting…”
Get full text
Journal Article -
19
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Published 30-09-2022“…Evaluation is a key part of machine learning (ML), yet there is a lack of support and tooling to enable its informed and systematic practice. We introduce…”
Get full text
Journal Article -
20
Dynabench: Rethinking Benchmarking in NLP
Published 07-04-2021“…We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports…”
Get full text
Journal Article