Search Results - "van Miltenburg, Emiel"
-
1
Human evaluation of automatically generated text: Current trends and best practice guidelines
Published in Computer speech & language (01-05-2021)“…•The current paper provides an overview of human evaluation practices in NLG.•The current paper gives an overview of the steps necessary to undertake a human…”
Get full text
Journal Article -
2
Scalar Diversity
Published in Journal of semantics (Nijmegen) (01-02-2016)“…Abstract We present experimental evidence showing that there is considerable variation between the rates at which scalar expressions from different lexical…”
Get full text
Journal Article -
3
Image captioning in different languages
Published 31-05-2024“…This short position paper provides a manually curated list of non-English image captioning datasets (as of May 2024). Through this list, we can observe the…”
Get full text
Journal Article -
4
Evaluating NLG systems: A brief introduction
Published 29-03-2023“…This year the International Conference on Natural Language Generation (INLG) will feature an award for the paper with the best evaluation. The purpose of this…”
Get full text
Journal Article -
5
On the use of human reference data for evaluating automatic image descriptions
Published 15-06-2020“…Automatic image description systems are commonly trained and evaluated using crowdsourced, human-generated image descriptions. The best-performing system is…”
Get full text
Journal Article -
6
Evaluating Task-oriented Dialogue Systems: A Systematic Review of Measures, Constructs and their Operationalisations
Published 21-12-2023“…This review gives an extensive overview of evaluation methods for task-oriented dialogue systems, paying special attention to practical applications of…”
Get full text
Journal Article -
7
Implicit causality in GPT-2: a case study
Published 08-12-2022“…This case study investigates the extent to which a language model (GPT-2) is able to capture native speakers' intuitions about implicit causality in a sentence…”
Get full text
Journal Article -
8
Stereotyping and Bias in the Flickr30K Dataset
Published 19-05-2016“…An untested assumption behind the crowdsourced descriptions of the images in the Flickr30K dataset (Young et al., 2014) is that they "focus only on the…”
Get full text
Journal Article -
9
Preregistering NLP Research
Published 11-03-2021“…Preregistration refers to the practice of specifying what you are going to do, and what you expect to find in your study, before carrying out the study. This…”
Get full text
Journal Article -
10
Room for improvement in automatic image description: an error analysis
Published 13-04-2017“…In recent years we have seen rapid and significant progress in automatic image description but what are the open problems in this area? Most work has been…”
Get full text
Journal Article -
11
Detecting and ordering adjectival scalemates
Published 30-04-2015“…This paper presents a pattern-based method that can be used to infer adjectival scales, such as <lukewarm, warm, hot>, from a corpus. Specifically, the…”
Get full text
Journal Article -
12
Cross-linguistic differences and similarities in image descriptions
Published 06-07-2017“…Automatic image description systems are commonly trained and evaluated on large image description datasets. Recently, researchers have started to collect such…”
Get full text
Journal Article -
13
Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Published 16-06-2021“…Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets…”
Get full text
Journal Article -
14
Pragmatic factors in image description: the case of negations
Published 20-06-2016“…We provide a qualitative analysis of the descriptions containing negations (no, not, n't, nobody, etc) in the Flickr30K corpus, and a categorization of…”
Get full text
Journal Article -
15
Underreporting of errors in NLG output, and what to do about it
Published 02-08-2021“…We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an…”
Get full text
Journal Article -
16
Neural data-to-text generation: A comparison between pipeline and end-to-end architectures
Published 23-08-2019“…Traditionally, most data-to-text applications have been designed using a modular pipeline architecture, in which non-linguistic input data is converted into…”
Get full text
Journal Article -
17
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Published 02-05-2023“…We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human…”
Get full text
Journal Article -
18
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Published 02-02-2021“…We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly…”
Get full text
Journal Article