DOI: 10.1145/3205911
Coz: Finding Code that Counts
with Causal Profiling
By Charlie Curtsinger and Emery D. Berger
Abstract
Improving performance is a central concern for software
developers. To locate optimization opportunities, developers rely on software profilers. However, these profilers only
report where programs spend their time: optimizing that
code may have no impact on performance. Past profilers
thus both waste developer time and make it difficult for
them to uncover significant optimization opportunities.
This paper introduces causal profiling. Unlike past profiling approaches, causal profiling indicates exactly where
programmers should focus their optimization efforts, and
quantifies their potential impact. Causal profiling works by
running performance experiments during program execution. Each experiment calculates the impact of any potential
optimization by virtually speeding up code: inserting pauses
that slow down all other code running concurrently. The key
insight is that this slowdown has the same relative effect as
running that line faster, thus “virtually” speeding it up.
This phenomenon is not limited to I/O operations. Figure 1
shows a simple program that illustrates the shortcomings of
existing profilers, along with its gprof profile as shown in
Figure 2a. This program spawns two threads, which invoke
functions fa and fb, respectively. Most profilers will report that
these functions comprise roughly half of the total execution
time. Other profilers may report that fa is on the critical path,
or that the main thread spends roughly equal time waiting for
fa and fb. While accurate, all of this information is potentially
misleading. Optimizing fa away entirely will only speed up the
program by 4.5% because fb becomes the new critical path.
Conventional profilers do not report the potential impact
of optimizations; developers are left to make these predictions based on their understanding of the program. While
these predictions may be easy for programs as simple as the
one shown in Figure 1, accurately predicting the effect of a
proposed optimization is nearly impossible for programmers
attempting to optimize large applications.
We present Coz, a causal profiler, which we evaluate on a
range of highly-tuned applications such as Memcached,
SQLite, and the PARSEC benchmark suite. Coz identifies previously unknown optimization opportunities that are both
significant and targeted. Guided by Coz, we improve the performance of Memcached by 9%, SQLite by 25%, and accelerate
six PARSEC applications by as much as 68%; in most cases,
these optimizations involve modifying under 10 lines of code.
This paper introduces causal profiling, an approach that
accurately and precisely indicates where programmers should
focus their optimization efforts, and quantifies their potential impact. Figure 2b shows the results of running Coz, our
prototype causal profiler. This profile plots the hypothetical
speedup of a line of code (x-axis) versus its impact on execution time (y-axis). The graph correctly shows that optimizing
either fa or fb in isolation would have little effect.
1. INTRODUCTION
Improving performance is a central concern for software
developers. While compiler optimizations are of some
assistance, they often do not have enough of an impact on
performance to meet programmers’ demands. 2
Programmers seeking to increase the throughput or
responsiveness of their applications thus must resort to
manual performance tuning.
A causal profiler conducts a series of performance experiments to empirically observe the effect of a potential optimization. Of course it is not possible to automatically speed up
any line of code by an arbitrary amount. Instead, a causal
profiler uses the novel technique of virtual speedups to
Figure 1. A simple multithreaded program that illustrates the
shortcomings of existing profilers. Optimizing fa will improve
performance by no more than 4.5%, while optimizing fb would have
no effect on performance.
Manually inspecting a program to find optimization
opportunities is impractical, so developers use profilers.
Conventional profilers rank code by its contribution to
total execution time. Prominent examples include oprofile, perf, and gprof. 7, 9, 11 Unfortunately, even when a profiler accurately reports where a program spends its time,
this information can lead programmers astray. Code that
runs for a long time is not necessarily a good choice for
optimization. For example, optimizing code that draws a
loading animation during a file download will not make
the program run faster, even though this code runs just as
longas the download. The original version of this paper was published in
1
2
3
4
5
6
7
8
9
10
11
example.cpp
void a() { // ~6.7 seconds
for (volatile size_t x=0; x<2000000000; x++) {}
}
void b() { // ~6.4 seconds
for (volatile size_t y=0; y<1900000000; y++) {}
}
int main() {
// Spawn both threads and wait for them.
thread a_thread(a), b_thread(b);
a_thread. join(); b_thread. join();
}
Proceedings the ACM 2015 Symposium on Operating Systems
This work was initiated and partially conducted while Charlie Curtsinger
was a Ph.D. student at the University of Massachusetts Amherst.
Principles, 184–197.