Figure 3. Relative number of reports per bucket and CDF for top 20
buckets from office 2010 ITP. Black bars are buckets for bugs fixed
in three-week sample period.
Excel
100%
Relative #
of reports
0%
50%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Relative
of reports
Outlook
100%
50%
0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
100%
Relative
of reports
Powerpoint
50%
0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Relative
of reports
Word
50%
100%
0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
the relative occurrence and cumulative distribution functions (CDFs) for the top 20 buckets of programs from the
Microsoft Office 2010 internal technical preview (ITP). The
top 20 bugs account for 30%–50% of all error reports. The
goal of the ITP was to find and fix as many bugs as possible
using WER before releasing a technical preview to customers. These graphs capture the team’s progress just 3 weeks
into the ITP. The ITP had been installed by 9000 internal
users, error reports had been collected, and the programmers had already fixed bugs responsible for over 22% of the
error reports. The team would work for another 3 weeks collecting error reports and fixing bugs, before releasing a technical preview to customers.
An informal historical analysis indicates that WER has
helped improve the quality of many classes of third-party
kernel code for Windows. Figure 4 plots the frequency of
system crashes for various classes of kernel drivers for systems running Windows XP in March 2004, March 2005, and
March 2006, normalized against system crashes caused by
hardware failures in the same period. Assuming that the
expected frequency of hardware failures remained roughly
constant over that time period (something we cannot yet
prove), the number of system crashes for kernel drivers
has gone down every year except for two classes of drivers:
anti-virus and storage.
As software providers begin to use WER more proactively,
their error report incidences decline dramatically. For example, in May 2007, one kernel-mode driver vendor began to use
Figure 4. Crashes by driver class normalized to hardware failures for
same period.
4.0
3. 5
3.0
2. 5
2.0
1. 5
1.0
0.5
0.0
Anti-virus
A p plic atio n drivers
C D -b urnin g
H ard w are failure
Display
M ulti m e dia
N et w orkin g
Printin g
Storage
2004 2005 2006
WER for the first time. In 30 days the vendor addressed the top
20 reported issues for their code. Within 5 months, as WER
directed users to pick up fixes, the percentage of all kernel
crashes attributed to the vendor dropped from 7.6% to 3.8%.
5. 3. Bucketing effectiveness
We know of two forms of weakness in the WER bucketing
heuristics: weaknesses in the condensing heuristics, which
result in mapping reports from a bug into too many buckets,
and weaknesses in the expanding heuristics, which result
in mapping more than one bug into the same bucket. An
analysis of error reports from the Microsoft Office 2010 ITP
shows that as many as 37% of these errors reports may be
incorrectly bucketed due to poor condensing heuristics. An
analysis of all kernel crashes collected in 2008 shows that as
many as 14% of these error reports were incorrectly bucketed
due to poor expanding heuristics.
While not ideal, WER’s bucketing heuristics are in practice effective in identifying and quantifying the occurrence
of errors caused by bugs in both software and hardware. In
2007, WER began receiving crash reports from computers
with a particular processor. The error reports were easily
bucketed based on an increase in system machine checks and
processor type. When Microsoft approached the processor
vendor, the vendor had already discovered and documented
externally the processor issue, but had no idea it could occur
so frequently until presented with WER data. The vendor
immediately released a microcode fix via WU—on day 10,
the black bar in Figure 5—and within 2 days, the number of
error reports had dropped to just 20% of peak.
6. ConCLuSIon
WER has changed the process of software development
at Microsoft. Development has become more empirical,
more immediate, and more user-focused. Microsoft teams
use WER to catch bugs after release, but perhaps as importantly, we use WER during internal and beta pre-release