Search Results - "Longpre, Shayne"
-
1
MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering
Published in Transactions of the Association for Computational Linguistics (06-12-2021)“…Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers…”
Get full text
Journal Article -
2
How Big Data Confers Market Power to Big Tech: Leveraging the Perspective of Data Science
Published in Antitrust bulletin (01-09-2020)“…Data-hungry applications are central to the largest online platforms. Using a novel approach that leverages data science to inform the economics, we…”
Get full text
Journal Article -
3
Considerations for governing open foundation models
Published in Science (American Association for the Advancement of Science) (11-10-2024)“…Different policy proposals may disproportionately affect the innovation ecosystem…”
Get full text
Journal Article -
4
Future and AI-Ready Data Strategies: Response to DOC RFI on AI and Open Government Data Assets
Published 26-07-2024“…The following is a response to the US Department of Commerce's Request for Information (RFI) regarding AI and Open Government Data Assets. First, we commend…”
Get full text
Journal Article -
5
A large-scale audit of dataset licensing and attribution in AI
Published in Nature machine intelligence (30-08-2024)“…The race to train language models on vast, diverse and inconsistently documented datasets raises pressing legal and ethical concerns. To improve data…”
Get full text
Journal Article -
6
A Systematic Review of NeurIPS Dataset Management Practices
Published 31-10-2024“…As new machine learning methods demand larger training datasets, researchers and developers face significant challenges in dataset management. Although ethics…”
Get full text
Journal Article -
7
AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research
Published 03-05-2024“…The recent embrace of machine learning (ML) in the development of autonomous weapons systems (AWS) creates serious risks to geopolitical stability and the free…”
Get full text
Journal Article -
8
The Foundation Model Transparency Index v1.1: May 2024
Published 17-07-2024“…Foundation models are increasingly consequential yet extremely opaque. To characterize the status quo, the Foundation Model Transparency Index was launched in…”
Get full text
Journal Article -
9
Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks
Published 20-08-2022“…Quantization, knowledge distillation, and magnitude pruning are among the most popular methods for neural network compression in NLP. Independently, these…”
Get full text
Journal Article -
10
Foundation Model Transparency Reports
Published 25-02-2024“…Published in AIES 2024 Foundation models are critical digital technologies with sweeping societal impact that necessitates transparency. To codify how…”
Get full text
Journal Article -
11
How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?
Published 04-10-2020“…Task-agnostic forms of data augmentation have proven widely effective in computer vision, even on pretrained models. In NLP similar results are reported most…”
Get full text
Journal Article -
12
On the Transferability of Minimal Prediction Preserving Inputs in Question Answering
Published 17-09-2020“…Recent work (Feng et al., 2018) establishes the presence of short, uninterpretable input fragments that yield high confidence and accuracy in neural models. We…”
Get full text
Journal Article -
13
MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering
Published 29-07-2020“…Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers…”
Get full text
Journal Article -
14
The Foundation Model Transparency Index
Published 19-10-2023“…Foundation models have rapidly permeated society, catalyzing a wave of generative AI applications spanning enterprise and consumer-facing contexts. While the…”
Get full text
Journal Article -
15
Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?
Published 19-04-2024“…Proceedings of ICML 2024, in PMLR 235:32711-32725. URL: https://proceedings.mlr.press/v235/longpre24b.html New capabilities in foundation models are owed in…”
Get full text
Journal Article -
16
Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP
Published 12-06-2021“…Retrieval is a core component for open-domain NLP tasks. In open-domain tasks, multiple entities can share a name, making disambiguation an inherent yet…”
Get full text
Journal Article -
17
To Err is AI : A Case Study Informing LLM Flaw Reporting Practices
Published 15-10-2024“…In August of 2024, 495 hackers generated evaluations in an open-ended bug bounty targeting the Open Language Model (OLMo) from The Allen Institute for AI. A…”
Get full text
Journal Article -
18
Leveraging Query Resolution and Reading Comprehension for Conversational Passage Retrieval
Published 17-02-2021“…This paper describes the participation of UvA.ILPS group at the TREC CAsT 2020 track. Our passage retrieval pipeline consists of (i) an initial retrieval…”
Get full text
Journal Article -
19
A Comparison of Question Rewriting Methods for Conversational Passage Retrieval
Published 18-01-2021“…Conversational passage retrieval relies on question rewriting to modify the original question so that it no longer depends on the conversation history. Several…”
Get full text
Journal Article -
20
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Published 02-05-2024“…Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs. However, concerns including transparency,…”
Get full text
Journal Article