equivalent C++ implementation, is not expressed in terms
of LLVM’s internal data structures and control flow, and can
be automatically verified by the Alive tool kit. This transformation illustrates 2 forms of abstraction supported by Alive:
abstraction over choice of a compile-time constant and
abstraction over bitwidth.
Constant expressions. To allo w algebraic simplifications
and constant folding, Alive includes a language for con-
stant expressions. A constant expression may be a literal,
an abstract constant (e.g., C in the example on the previous
page), or a unary or binary operator applied to 1 or 2 constant
expressions. The operators include signed and unsigned
arithmetic operators and bitwise logical operators. Alive
also supports functions on constant expressions. Built-in
functions include type conversions and mathematical and
bitvector utilities (e.g., abs(), umax(), width() ).
So far Alive has helped us discover twenty-three previously unknown bugs in the LLVM InstCombine transformations. Furthermore, we have prevented dozens of bugs from
getting into LLVM by monitoring the various InstCombine
patches as they were committed to the LLVM subversion
repository. Several LLVM developers are currently using the
Alive prototype to check their InstCombine transformations. Alive is open source and it is also available on-line at
2. 2. Type system
2. THE ALIVE LANGUAGEa
We designed Alive to resemble the LLVM IR because our
users—the LLVM developers—are already experts with it.
Alive’s most important features include its abstraction
over choice of constants, over the bitwidths of operands
(Section 2. 2), and over LLVM’s instruction attributes that
control undefined behavior (Section 2. 4).
Alive supports a subset of LLVM’s types, such as integers and
pointers. LLVM uses arbitrarily-sized integers, with a separate type for each width (e.g., i8 or i57). Alive has limited
support for LLVM’s pointer and array types, and does not supports structures or vectors. Recent efforts have subsequently
extended Alive with support for floating-point types. 14, 16
Unlike LLVM, Alive permits type annotations to be
omitted and does not require values to have a unique type.
This enables succinct specifications of optimizations in
Alive, as many peephole optimizations are not type-spe-cific. A set of possible types is inferred for each implic-itly-typed value, and the correctness of an optimization
is checked for each type assignment. Because LLVM has
infinitely many integer types, we set an upper bound of 64
bits for implicitly typed integer values.
2. 3. Built-In predicates
2. 1. Syntax
An Alive transformation has the form A ⇒ B, where A is the
source template (unoptimized code) and B is the target template (optimized code). Additionally, a transformation may
include a precondition. Since Alive’s representation, like
LLVM’s, is based on directed graphs of instructions in Static
Single Assignment (SSA) form, the ordering of non-depen-dent instructions is irrelevant.
Alive implements a subset of LLVM’s integer and pointer
instructions. It also has limited support for branches: to
avoid loops, they are not allowed to jump backwards. Alive
supports LLVM’s nsw, nuw, and exact instruction attributes that weaken the behavior of integer instructions by
adding undefined behaviors.
Some peephole optimizations use the results of dataflow analyses. Alive makes these results available using
a collection of built-in predicates such as isPowerOf2(),
MaskedValueIsZero(), and WillNotOverflowSignedAdd().
The analyses producing these results are trusted by Alive:
verifying their correctness is not within Alive’s scope.
Predicates can be combined with the usual logical connectives. Figure 2 shows an example transformation that
includes a built-in predicate in its precondition.
2. 4. Undefined behaviors in LLVM
Scoping. The source and target templates must have
a common root variable that is the root of the respective
graphs. The remaining variables are either inputs to the
transformation or else temporary variables produced
by instructions in the source or target template. Inputs
are visible throughout the source and target templates.
Temporaries defined in the source template are in scope for
the precondition, the target, and the remaining part of the
source from the point of definition. Temporaries declared
in the target are in scope for the remainder of the target. To
help catch errors, every temporary in the source template
must be used in a later source instruction or be overwritten
in the target, and all target instructions must be used in a
later target instruction or overwrite a source instruction.
To aggressively optimize well-defined programs, LLVM
has 3 distinct kinds of undefined behavior. Together, they
enable many desirable optimizations, and LLVM aggressively exploits these opportunities.
Undefined behavior in LLVM resembles undefined behavior in C and C++: anything may happen to a program that
executes it. The compiler may simply assume that undefined behavior does not occur; this assumption places a
Figure 2. An example illustrating many of Alive’s features. ( (B ∨ V)
∧ C1) ∨ (B ∧ C2) can be transformed to (B ∨ V) ∧ (C1 ∨ C2) when
C1 ∧ C2 = 0 and when the predicate MaskedValueIsZero(V, ¬C1)
is true, indicating that an LLVM dataflow analysis has concluded
that V ∧ ¬C1 = 0. B and V are input variables. C1 and C2 are
constants. t0, t1, and t2 are temporaries. This transformation
is rooted at R.
Pre: C1 & C2 == 0 && MaskedValueIsZero(%V, ~C1)
%t0 = or %B, %V
%t1 = and %t0, C1
%t2 = and %B, C2
%R = or %t1, %t2
%R = and %t0, (C1 | C2)
a The latest version of Alive can be found at https://github.com/nunoplopes/alive.