example, that a variable is nonzero)
to ensure an optimization applies.
When hand-implementing an optimization, the programmer must carefully follow all of these constraints.
The paper describes Alive, a tool
that realizes the domain-specific
language for describing peephole
optimizations. Instead of writing an
optimization by hand, the compiler writer expresses it using natural
LLVM syntax. Alive generates a C++
implementation, thereby eliminating the ugly boilerplate. More importantly, Alive also verifies the proposed optimization is indeed correct:
the tool understands the semantics
of LLVM (including the tricky corner
cases), and checks that the optimized
code agrees with the original. When
an optimization is broken, Alive produces a counterexample showing
the problem. The tool can even infer
parts of the optimization, which lets
compiler writers be more aggressive
about producing good code, without
the fear of bugs.
To show that Alive works in practice,
the authors reimplemented hundreds
of peephole optimizations from the
LLVM suite and stress-tested them to
look for miscompilation bugs. They
This paper is not the end of the story
for compiler correctness—there are
other kinds of optimizations, analyses, and transformations not covered
here—but it does provide a large step
forward by showing how to build a tool
that is practical yet backed by modern
verification technology. In short, this
paper suggests a way to make better
compilers that are easier to build and
far less buggy than they are today.
Steve Zdancewic is a professor in the Department
of Computer Science and Information Science
at the University of Pennsylvania, Philadelphia, PA, USA.
Copyright held by author.
SOFTWARE ENGINEERS—AND, by extension, anyone who uses the software
they create (that is, nearly everyone)—
rely critically on compilers. These
ubiquitous tools, familiar even to the
most novice programmer, translate
high-level ideas, expressed as code,
into the low-level instructions understood by computer hardware.
Using compilers is so commonplace and transparent that most of us
soon forget about the near-miracles of
code analysis and optimization they
perform at a keystroke. Today’s compilers produce marvelously tuned and
optimized code that almost always
performs better than what could be
achieved painstakingly by hand.
Optimizing compilers are an essential
part of our computing infrastructure.
To achieve these near-miracles,
compilers themselves are very complex
beasts, comprising tens- or hundreds-of-thousands of lines of code, developed over many years by many people.
And, just as with any software, they too
are potentially buggy. A broken compiler might produce erroneous machine
code, thereby converting the programmer’s intended algorithm into gibberish, or, worse, into a subtly flawed
product that sometimes gets the wrong
answer or has a pitfall waiting for just
the right input to be triggered.
Fortunately, thanks to their frequent use, compilers tend to be
very well-tested software: they do
their job correctly most of the time
for most programs. Nevertheless,
compiler bugs do exist—as demonstrated by Regehr and his collaborators in prior work using their Csmith
tool. They can often be provoked by
surprisingly small (but tricky!) programs that involve the corner-cases
of fixed-bitwidth arithmetic calculations, which are incorrectly optimized when generating low-level
code. These kinds of compiler bugs
are potentially catastrophic, notoriously difficult for programmers to
diagnose, and require real expertise
to track down and fix.
So, what do we do about them?
The authors of the following paper give us a compelling and practical answer. Their main idea is to
raise the level of abstraction used
by compiler implementors, allowing them to express optimizations
by using a domain-specific language
suited perfectly for the task.
The authors explain their ideas in
the context of the LLVM intermediate
representation, which, owing to the
widespread use of LLVM in both academia and industry (most prominently
by Apple), makes the results in this paper immediately applicable—indeed
the authors found numerous bugs in
the LLVM implementation, and have
been working with LLVM developers
to find more.
The paper focuses on peephole
optimizations, in which the compiler
rewrites a short sequence of instructions into a more efficient sequence,
for instance, by replacing multiplication by bitshift and addition operations. The traditional way to implement this kind of thing in LLVM is
for the compiler implementor to
manually write a bunch of C++ code
that case-analyzes the LLVM instructions looking for a specific pattern
that should be replaced.
Why is it difficult for compiler
implementors to get this right? At a
superficial level, such code consists
of a lot of fiddly boilerplate, and,
when there are hundreds of peephole
optimizations, as in an industrial-strength compiler like LLVM, bugs
are bound to creep in.
But the problem is actually much
deeper. One difficulty is the semantics for even “simple” arithmetic operations is not so straightforward.
These operations include many corner cases: underflows, overflows, and
undefined behaviors, all of which
must be properly accounted for to ensure an optimization is correct. Moreover, not all optimizations apply in all
situations. The compiler might need
to know facts about the program (for
To view the accompanying paper,
visit doi.acm.org/10.1145/3166064 Technical Perspective
Building Bug-Free Compilers
By Steve Zdancewic