Search Results - "Mahamood, Saad"
-
1
Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Published 17-08-2024“…Automatic metrics are extensively used to evaluate natural language processing systems. However, there has been increasing focus on how they are used and…”
Get full text
Journal Article -
2
On the Role of Summary Content Units in Text Summarization Evaluation
Published 02-04-2024“…At the heart of the Pyramid evaluation method for text summarization lie human written summary content units (SCUs). These SCUs are concise sentences that…”
Get full text
Journal Article -
3
Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization
Published 20-12-2022“…To prevent the costly and inefficient use of resources on low-quality annotations, we want a method for creating a pool of dependable annotators who can…”
Get full text
Journal Article -
4
Generating affective natural language for parents of neonatal infants
Published 01-01-2010“…The thesis presented here describes original research in the field of Natural Language Generation (NLG). NLG is the subfield of artificial intelligence that is…”
Get full text
Dissertation -
5
Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Published 16-06-2021“…Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets…”
Get full text
Journal Article -
6
Underreporting of errors in NLG output, and what to do about it
Published 02-08-2021“…We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an…”
Get full text
Journal Article -
7
Neonatal Intensive Care Information for Parents An Affective Approach
Published in 2008 21st IEEE International Symposium on Computer-Based Medical Systems (01-06-2008)“…Based upon qualitative work done with former Neonatal Intensive Care Unit parents, we propose a potential user model to estimate the level of stress/anxiety…”
Get full text
Conference Proceeding -
8
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Published 02-05-2023“…We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human…”
Get full text
Journal Article -
9
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Published 22-06-2022“…Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison…”
Get full text
Journal Article -
10
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Published 05-12-2021“…Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the…”
Get full text
Journal Article -
11
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Published 02-02-2021“…We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly…”
Get full text
Journal Article