Internal functions:
F ::= {name = id; sig = sig;
→
params = r ;
stacksize = n;
entrypoint = l;
code = g}
parameters
size of stack data block
label of first instruction
control-flow graph
→
l if eval_cond(cond, R(r )) = true
l′ = true →
l if eval_cond(cond, R(r )) = false
false
⊥
e
G S(S, g, s, l, R, M) → S(S, g, s,l′, R, M)
External functions:
Fe ::= {name = id; sig = sig}
Each instruction takes its arguments in a list of pseudo-
→
registers r and stores its result, if any, in a pseudo-register
r. Additionally, it carries the labels l of its possible successors. Instructions include arithmetic operations op (with
an important special case op(move, r, r′, l) representing
a register-to-register copy), memory loads and stores (of a
quantity κ at the address obtained by applying addressing
→
mode mode to registers r ), conditional branches (with two
successors), and function calls, tail-calls, and returns.
An RTL program is composed of a set of named functions, either internal or external. Internal functions are
defined within RTL by their CFG, entry point in the CFG,
and parameter registers. External functions are not defined
but merely declared: they model input/output operations
and similar system calls. Functions and call instructions
carry signatures sig specifying the number and register
classes (int or float) of their arguments and results.
The dynamic semantics of RTL is specified in small-step
operational style, as a labeled transition system. The predicate G S → S′ denotes one step of execution from state S
t
⊥
to state S′. The global environment G maps function pointers and names to function definitions. The trace t records
the input–output events performed by this execution step:
it is empty (t = e) for all instructions except calls to external functions, in which case t records the function name,
parameters, and results of the call.
Execution states S are of the form S(Σ, g, s, l, R, M)
where g is the CFG of the function currently executing, l
the current program point within this function, and s a
memory block containing its activation record. The register state R maps pseudo-registers to their current values
(discriminated union of 32-bit integers, 64-bit floats, and
pointers). Likewise, the memory state M maps (pointer,
memory quantity) pairs to values, taking overlap between
multi-byte quantities into account. 14 Finally, Σ models the call stack: it records pending function calls with
their ( g, s, l, R) components. Two slightly different forms
of execution states, call states and return states, appear
when modeling function calls and returns, but will not be
described here.
To give a flavor of RTL’s semantics, here are two of the
rules defining the one-step transition relation, for arithmetic operations and conditional branches, respectively:
4. 2. The register allocation algorithm
The goal of the register allocation pass is to replace the
pseudo-registers r that appear in unbounded quantity in
the original RTL code by locations l, which are either hardware registers (available in small, fixed quantity) or abstract
stack slots in the activation record (available in unbounded
quantity). Since accessing a hardware register is much
faster than accessing a stack slot, the use of hardware registers must be maximized. Other aspects of register allocation, such as insertion of reload and spill instructions to
access stack slots, are left to subsequent passes.
Register allocation starts with a standard liveness analysis performed by backward dataflow analysis. The dataflow
equations for liveness are of the form
LV(l ) = ∪ { T (s, LV(s)) | s successor of l}
( 8)
→→
g (l ) = op(op, r , r, l′) eval_op(G, s, op, R(r )) = u
e
G S(S, g, s, l, R, M) → S(S, g, s, l′, R{r ← u}, M)
⊥
→
g(l) = cond(cond, r , l , l )
true false
The transfer function T(s, LV(s)) computes the set of
pseudo-registers live “before” a program point s as a function of the pseudo-registers LV(s) live “after” that point. For
→
instance, if the instruction at s is op(op, r , r, s′), the result
r becomes dead because it is redefined at this point, but
→
the arguments r become live. because they are used at
→
this point: T(s, LV(s) ) = (LV(s){r}) ∪ r . However, if r is dead
“after” (r ∉ L(s) ), the instruction is dead code that will be
eliminated later, so we can take T(s, LV(s) ) = LV (s) instead.
The dataflow equations are solved iteratively using
Kildall’s worklist algorithm. CompCert provides a generic
implementation of Kildall’s algorithm and of its correctness proof, which is also used for other optimization passes.
The result of this algorithm is a mapping LV from program
points to sets of live registers that is proved to satisfy the
correctness condition LV(l) ⊇ T(s, LV(s) ) for all s successor
of l. We only prove an inequation rather than the standard
dataflow equation ( 8) because we are interested only in the
correctness of the solution, not in its optimality.
An interference graph having pseudo-registers as nodes
is then built following Chaitin’s rules, 6 and proved to contain all the necessary interference edges. Typically, if two
pseudo-registers r and r′ are simultaneously live at a program point, the graph must contain an edge between r and
r ′. Interferences are of the form “these two pseudo-registers
interfere” or “this pseudo-register and this hardware register interfere,” the latter being used to ensure that pseudo-registers live across a function call are not allocated to
caller-save registers. Preference edges (“these two pseudo-registers should preferably be allocated the same location”
or “this pseudo-register should preferably be allocated this
location”) are also recorded, although they do not affect
correctness of the register allocation, just its quality.
The central step of register allocation consists in coloring the interference graph, assigning to each node r
a “color” j (r) that is either a hardware register or a stack
slot, under the constraint that two nodes connected by an