An analogous nuw attribute exists to rule out unsigned
wrap. If an add, subtract, multiply, or left shift operation
with an nsw or nuw attribute overflows, the result is poison.
Additionally, LLVM’s right shift and divide instructions
have an exact attribute that requires an operation to not be
lossy. Table 2 provides the constraints for the instructions to
be poison-free. Developers writing Alive patterns can omit
instruction attributes, in which case Alive infers where they
can be safely placed.
3. VERIFYING OPTIMIZATIONS IN ALIVE
The Alive interpreter verifies a transformation by automatically encoding the source and target, their definedness
conditions, and the overall correctness criteria into SMT
queries. An Alive transformation is parametric over the set
of all feasible types: the concrete types satisfying the constraints of LLVM’s type system and not exceeding the default
limit of 64 bits.
In the absence of undefined behavior in the source or
target of an Alive transformation, we can check correctness using a straightforward equivalence check: for each
valuation of the input variables, the value of any variable
that is present in both the source and target must be identical. However, an equivalence check is not sufficient to
prove correctness in the presence of any of the 3 kinds
of undefined behavior described in Section 2. 4. We use
refinement to reason about optimizations in the presence of undefined behavior. The target of an Alive transformation refines the source template if all the behaviors
of the target are included in the set of behaviors of the
source. That is, a transformation may remove undefined
behaviors but not add them.
When an optimization contains or may produce undef
values, we need to ensure that the target never produces a
value that the source does not produce. In other words, an
undef in the source represents a set of values and the tar-
get can refine it to any particular value, but an undef in the
corresponding obligation on the program developer (or on
the compiler and language runtime, when a safe language
is compiled to LLVM) to ensure that undefined operations
are never executed. An instruction that executes unde-
fined behavior can be replaced with an arbitrary sequence
of instructions. When an instruction executes undefined
behavior, all subsequent instructions can be considered
undefined as well.
Table 1 shows when Alive’s arithmetic instructions have
defined behavior, following the LLVM IR specification. For
example, the udiv instruction is defined only when the
dividend is non-zero. With the exception of memory access
instructions (discussed in the original paper10), instructions
not listed in Table 1 are always defined.
The undefined value (undef in the IR) is a limited form of
undefined behavior that mimics a free-floating hardware register than can return any value each time it is read. Semantically,
undef stands for the set of all possible bit patterns for a particular type; the compiler is free to pick a convenient value for
each use of undef to enable aggressive optimizations. For
example, a 1-bit undefined value, sign-extended to 32 bits,
produces a variable containing either all 0s or all 1s.
Poison values, which are distinct from undefined values, are used to indicate that a side-effect-free instruction
has a condition that produces undefined behavior. When
the poison value gets used by an instruction with side
effects, the program exhibits true undefined behavior.
Hence, poison values are deferred undefined behaviors:
they are intended to support speculative execution of
possibly-undefined operations. Poison values taint subsequent dependent instructions; unlike undef, poison
values cannot be untainted by subsequent operations.
The subtleties in the semantics of undef and poison values and its impact on either enabling or disabling optimizations are currently being explored. 8
Shift instructions, shl, ashr, and lshr, produce a poison value when their second argument, the shift amount, is
larger than or equal to the bit width of the operation.
Instruction attributes modify the behavior of some
LLVM instructions. The nsw attribute (“no signed wrap”)
makes signed overflow undefined. For example, this Alive
transformation, which is equivalent to the optimization
of (x+ 1)>x to 1 in C and C++ where x is a signed integer,
1 = add nsw %x, 1
% 2 = icmp sgt 1, %x
% 2 = true
Table 1. The constraints for arithmetic instructions to be defined.
<u is unsigned less-than. INT_MIN is the smallest signed integer
value for a given bitwidth.
Instruction Definedness constraint
sdiv a, b b ≠ 0 ∧ (a ≠ INT_MIN ∨ b ≠ − 1)
udiv a, b b ≠ 0
srem a, b b ≠ 0 ∧ (a ≠ INT_MIN ∨ b ≠ − 1)
urem a, b b ≠ 0
Table 2. The constraints for arithmetic instructions to be poison-free.
>>u and ÷u are the unsigned shift and division operations. B is the
bitwidth of the operands. SExt(a, n) sign-extends a by n bits;
ZExt(a, n) zero-extends a by n bits.
Instruction Constraints for poison-free execution
add nsw a, b SExt(a, 1) + SExt(b, 1) = SExt(a + b, 1)
add nuw a, b ZExt(a, 1) + ZExt(b, 1) = ZExt(a + b, 1)
sub nsw a, b SExt(a, 1) − SExt(b, 1) = SExt(a − b, 1)
sub nuw a, b ZExt(a, 1) − ZExt(b, 1) = ZExt(a − b, 1)
mul nsw a, b SExt(a, B) × SExt(b, B) = SExt(a × b, B)
mul nuw a, b ZExt(a, B) × ZExt(b, B) = ZExt(a × b, B)
sdiv exact a, b (a ÷ b) × b = a
udiv exact a, b (a ÷u b) × b = a
shl a, b b <u B
shl nswa,b b<uB∧ (a<<b)>>b=a
shl nuwa,b b<uB∧ (a<<b)>>ub=a
ashr a, b b <u B
ashr exact a, b b <u B ∧ (a >> b) << b = a
lshr a, b b <u B
lshr exact a, b b <u B ∧ (a >>u b) << b = a