Automated Program Repair: Emerging trends pose and expose problems for benchmarks
Machine learning (ML) now pervades the field of Automated Program Repair (APR). Algorithms deploy neural machine translation and large language models (LLMs) to generate software patches, among other tasks. But, there are important differences between these applications of ML and earlier work. Evalu...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
08-05-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Machine learning (ML) now pervades the field of Automated Program Repair
(APR). Algorithms deploy neural machine translation and large language models
(LLMs) to generate software patches, among other tasks. But, there are
important differences between these applications of ML and earlier work.
Evaluations and comparisons must take care to ensure that results are valid and
likely to generalize. A challenge is that the most popular APR evaluation
benchmarks were not designed with ML techniques in mind. This is especially
true for LLMs, whose large and often poorly-disclosed training datasets may
include problems on which they are evaluated. |
---|---|
DOI: | 10.48550/arxiv.2405.05455 |