of correctness. Finding such a measure
is a difficult and unsolved problem,
which applies both to patches produced by humans and by machines. To
date, researchers have assessed quality
using human judgment, crowdsourced
evaluations, comparison to developer
patches of historical bugs, or patched
program performance on indicative
program workloads or held-out test
cases. The recent work of Xiong et al. 38
provides a novel outlook for filtering
patches based on the behavior of the
patched program vis-a-vis the original
program on passing and failing tests.
Alternative oracles. The bulk of the
existing literature focuses on test-based
repair where the correctness criteria is
given as a test suite. Richer correctness
properties, for example, assertions or
contracts, can be used to guide repair
when available. 34 Other approaches
consider alternative oracles, such as
potential invariants inferred from dynamic executions. 20 Such approaches
can follow the “bugs as deviant behavior” philosophy, where deviations of an
execution from “normal” executions
are observed and avoided. In particular, Weimer et al. 35 provide an overview
of various (partial) oracles that can be
used for repair.
Correctness guarantees. Few of today’s repair techniques provide any
guarantees about the correctness of
produced patches, which can hinder
the application of automated repair,
especially to safety-critical software. If
correctness guarantees are available
as properties, such as pre-conditions,
post-conditions, and object invariants,
these can be used to guide program
repair. The work of Logozzo and Ball11
reports such an effort where repair
attempts to increase the number of
property-preserving executions, while
reducing the number of violating executions. However, such formal techniques are contingent on the properties to drive the repair being available.
Maintainability. Once a correct fix
has been detected and applied to the
code base, the fixed code should be as
easy to maintain as a human fix. Initial
work in this domain has investigated
the effect that automatically generated
patches impact human maintenance
behavior. 4 More study is needed to de-
velop a foundational understanding of
change quality, especially with respect
to the human developers who will in-
teract with a modified system.
A promising avenue for tackling the
quality challenge is by leveraging infor-
mation available from other develop-
ment artifacts, including documenta-
tion or formal specifications, language
specifications and type systems, or
source control histories of either the
program under repair or of the broad
corpora of freely available open source
software. Such additional information
can reduce the repair search space by
imposing new constraints on potential
program modifications (for example,
as suggested by a type system) and
increase the probability that the pro-
duced patch is human-acceptable.
Scope. The scope challenge is about
further extending the kinds of bugs
and programs to which automated re-
pair applies.
General-purpose repair. Research
in program analysis has long focused
on special-purpose repair tools for
specific kind of errors, such as buffer
overflow errors, 20 or bugs in domain-
specific languages. 24 More recent work,
as discussed earlier, focuses on gen-
eral-purpose repair tools that do not
make any assumptions about the kind
of bugs under consideration. While au-
tomatically fixing all bugs seems out of
reach in the foreseeable future, target-
ing a broad set of bugs remains an im-
portant challenge.
Complex programs and patches.
Many of the key innovations in the
initial research in program repair con-
cerned the scalability of techniques
to complex programs. For example,
search-based techniques moved from
reasoning over populations of pro-
gram ASTs to populations of small edit
programs (the patches themselves)
and developed other techniques to ef-
fectively constrain the search space.
Constraint-based repair strategies
have moved from reasoning about the
semantics of entire methods to only
reasoning about the desired change
in behavior. These efforts enable scal-
ing to programs of significant size,
and multi-line repairs. 16 We antici-
pate that scalability will periodically
return to the fore as program repair
techniques engage in more complex
reasoning. We emphasize here that
program repair techniques should
remain scalable with respect to large
Once repair
constraints
or angelic value(s)
of a statement
to be fixed
are obtained,
these techniques
generate a patch
to realize the
angelic value.