Search Results - "Akiki, Christopher"
-
1
MuSe: The Musical Sentiment Dataset
Published in Journal of open humanities data (07-07-2021)“…The MuSe (Music Sentiment) dataset contains sentiment information for 90,001 songs. We computed scores for the affective dimensions of valence, dominance, and…”
Get full text
Journal Article -
2
BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model
Published in Psychofenia (09-12-2022)“…The BigScience Workshop was a value-driven initiative that spanned one and half years of interdisciplinary research and culminated in the creation of ROOTS, a…”
Get full text
Conference Proceeding -
3
BERTian Poetics: Constrained Composition with Masked LMs
Published 28-10-2021“…Masked language models have recently been interpreted as energy-based sequence models that can be generated from using a Metropolis--Hastings sampler. This…”
Get full text
Journal Article -
4
Tracking Discourse Influence in Darknet Forums
Published 04-02-2022“…This technical report documents our efforts in addressing the tasks set forth by the 2021 AMoC (Advanced Modelling of Cyber Criminal Careers) Hackathon. Our…”
Get full text
Journal Article -
5
Stable Bias: Analyzing Societal Representations in Diffusion Models
Published 20-03-2023“…As machine learning-enabled Text-to-Image (TTI) systems are becoming increasingly prevalent and seeing growing adoption as commercial services, characterizing…”
Get full text
Journal Article -
6
How Train-Test Leakage Affects Zero-shot Retrieval
Published 29-06-2022“…Neural retrieval models are often trained on (subsets of) the millions of queries of the MS MARCO / ORCAS datasets and then tested on the 250 Robust04 queries…”
Get full text
Journal Article -
7
Exploring Hyperparameter Usage and Tuning in Machine Learning Research
Published in 2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN) (01-05-2023)“…The success of machine learning (ML) models depends on careful experimentation and optimization of their hyperparameters. Tuning can affect the reliability and…”
Get full text
Conference Proceeding -
8
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face
Published 28-02-2023“…We present Spacerini, a tool that integrates the Pyserini toolkit for reproducible information retrieval research with Hugging Face to enable the seamless…”
Get full text
Journal Article -
9
Towards Openness Beyond Open Access: User Journeys through 3 Open AI Collaboratives
Published 20-01-2023“…Open Artificial Intelligence (Open source AI) collaboratives offer alternative pathways for how AI can be developed beyond well-resourced technology companies…”
Get full text
Journal Article -
10
BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model
Published 09-12-2022“…The BigScience Workshop was a value-driven initiative that spanned one and half years of interdisciplinary research and culminated in the creation of ROOTS, a…”
Get full text
Journal Article -
11
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration
Published 02-06-2023“…Noticing the urgent need to provide tools for fast and user-friendly qualitative analysis of large-scale textual corpora of the modern NLP, we propose to turn…”
Get full text
Journal Article -
12
The ROOTS Search Tool: Data Transparency for LLMs
Published 27-02-2023“…ROOTS is a 1.6TB multilingual text corpus developed for the training of BLOOM, currently the largest language model explicitly accompanied by commensurate data…”
Get full text
Journal Article -
13
Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
Published 11-04-2022“…In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution…”
Get full text
Journal Article -
14
StarCoder 2 and The Stack v2: The Next Generation
Published 29-02-2024“…The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces…”
Get full text
Journal Article -
15
SantaCoder: don't reach for the stars
Published 09-01-2023“…The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes…”
Get full text
Journal Article -
16
StarCoder: may the source be with you
Published 09-05-2023“…The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces…”
Get full text
Journal Article -
17
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Published 07-03-2023“…As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The…”
Get full text
Journal Article