At this point, V passes all of the test cases, and the search
terminates with V as the primary repair. The minimization
step is invoked to discard unnecessary changes. Compared
to the original program (and using the line numbers from
the original), there are three key changes: c1 = “days −=
366” deleted from line 6; c2 = “days −= 366” inserted
between lines 9 and 10; and c3 = “days −= 366” inserted
between lines 10 and 11. Only c1 and c3 are necessary to pass
all tests, so change c2 is deleted, producing the final repair:
// inserted
On average, constructing and minimizing a repair for the
Zune fragment shown here takes our prototype a total 42 s,
including the time to compile and evaluate variants against
a suite of five positive and two negative tests.
4. ResuLts
To date, we have repaired 20 defects in modules totaling
186kLOC from 20 programs totaling 2.3MLOC (not all
shown here). We have repaired defects from eight classes:
infinite loop, segmentation fault, heap buffer overrun,
nonoverflow denial of service, integer overflow, invalid
exception, incorrect output, and format string vulnerability. Constructing a repair requires 1428 s on average,
most of which is spent performing an average of 3903
fitness evaluations. In Table 1, we summarize results for
11 benchmark programs reported in Forrest et al. 8 and
Weimer et al. 24 The benchmark programs, test cases, GP
code, and the supporting infrastructure used to generate
and reproduce these results are available at: http://gen
prog.adaptive.cs.unm.edu/.
In all of our experiments, a standard trial uses the
following setup. The population size is 40, and GP runs
for a maximum of 20 generations. For the first 10 generations, the global mutation rate is 0.06. If no primary
repair is found, the current population is discarded, the
global mutation rate is halved to 0.03, and, if possible, the
weighted path is restricted to statements visited solely
during the negative test case, and the GP is run for 10
additional generations. These results show that GP can
automatically discover repairs for a variety of documented
bugs in production C programs.
1 void zunebug_repair (int days) {
2 int year = 1980;
3 while (days > 365) {
4 if (isLeapYear (year)) {
5 if (days > 366) {
6 // days −= 366; // deleted
7 year += 1;
8 }
9 else {
10 }
11 days −= 366;
12 } else {
13 days −= 365;
14 year += 1;
15 }
16 }
17 printf (“the year is %dn”, year);
18 }
table 1. eleven defects repaired by Genetic Programming, summarized from previous work. the size of each program is given in lines of code
as well as weighted path units (see section 2. 2). each repair used five or six positive tests and one or two negative tests. the “time” column
gives the total wall-clock time required to produce and minimize the repair (on a successful trial). the “fitness evals” column lists the
number of times the entire fitness function was called before a repair was found (averaged over only the successful trials). the “Repair size”
column gives the size of the final minimized repair, as measured in lines by the unix diff utility.
Program
gcd
zune
uniq utx 4. 3
look utx 4. 3
look svr 4.0
units svr 4.0
deroff utx 4. 3
nullhttpd
0.5.0
indent 1. 9. 1
Lines of code
22
28
1146
1169
1363
1504
2236
5575
Weighted
Path
1. 3
2. 9
81. 5
213.0
32. 4
2159.7
251.4
768.5
Description
euclid’s algorithm
ms Zune excerpt
duplicate filtering
dictionary lookup
dictionary lookup
metric conversion
document processing
Webserver
fault
infinite loop
infinite loop
segmentation fault
segmentation fault
infinite loop
segmentation fault
segmentation fault
heap buffer overrun
time (s)
153
42
34
45
55
109
131
578
fitness
evals
45.0
203.5
15. 5
20. 1
13. 5
61. 7
28. 6
95. 1
Repair
size
2
4
4
11
3
4
3
5
9906
1435.9
source code
formatting
lexical analyzer
generator
Graphical tetris
game
infinite loop
546
108. 6
2
flex 2. 5.4a
18775
3836.6
segmentation fault
230
39. 4
3
atris 1.0.6
21553
34.0
stack buffer overrun
80
20. 2
3
Total
63277
8817.2
2003
651.2