researchhighlights
3. oVeRVie W of The CoMPCeRT CoMPiLeR
3. 1. The source language
The source language of the CompCert compiler, called
Clight, 5 is a large subset of the C programming language,
comparable to the subsets commonly recommended for
writing critical embedded software. It supports almost
all C data types, including pointers, arrays, struct and
union types; all structured control (if/then, loops,
break, continue, Java-style switch); and the full power
of functions, including recursive functions and function
pointers. The main omissions are extended-precision arithmetic (long long and long double); the goto statement;
nonstructured forms of switch such as Duff’s device; passing struct and union parameters and results by value;
and functions with variable numbers of arguments. Other
features of C are missing from Clight but are supported
through code expansion (de-sugaring) during parsing: side
effects within expressions (Clight expressions are side-effect
free) and block-scoped variables (Clight has only global and
function-local variables).
The semantics of Clight is formally defined in big-step
operational style. The semantics is deterministic and makes
precise a number of behaviors left unspecified or undefined
in the ISO C standard, such as the sizes of data types, the
results of signed arithmetic operations in case of overflow,
and the evaluation order. Other undefined C behaviors are
consistently turned into “going wrong” behaviors, such
as dereferencing the null pointer or accessing arrays out
of bounds. Memory is modeled as a collection of disjoint
blocks, each block being accessed through byte offsets;
pointer values are pairs of a block identifier and a byte offset.
This way, pointer arithmetic is modeled accurately, even in
the presence of casts between incompatible pointer types.
3. 2. Compilation passes and intermediate languages
The formally verified part of the CompCert compiler translates from Clight abstract syntax to PPC abstract syntax, PPC
being a subset of PowerPC assembly language. As depicted
in Figure 1, the compiler is composed of 14 passes that
go through eight intermediate languages. Not detailed in
Figure 1 are the parts of the compiler that are not verified
yet: upstream, a parser, type-checker and simplifier that generates Clight abstract syntax from C source files and is based
on the CIL library21; downstream, a printer for PPC abstract
syntax trees in concrete assembly syntax, followed by generation of executable binary using the system’s assembler
and linker.
The front-end of the compiler translates away C-specific
features in two passes, going through the C#minor and
Cminor intermediate languages. C#minor is a simplified,
typeless variant of Clight where distinct arithmetic operators
are provided for integers, pointers and floats, and C loops
are replaced by infinite loops plus blocks and multilevel
exits from enclosing blocks. The first pass translates C loops
accordingly and eliminates all type-dependent behaviors:
operator overloading is resolved; memory loads and stores,
as well as address computations, are made explicit. The
next intermediate language, Cminor, is similar to C#minor
with the omission of the & (address-of) operator. Cminor
function-local variables do not reside in memory, and their
address cannot be taken. However, Cminor supports explicit
stack allocation of data in the activation records of functions. The translation from C#minor to Cminor therefore
recognizes scalar local variables whose addresses are never
taken, assigning them to Cminor local variables and making them candidates for register allocation later; other local
variables are stack-allocated in the activation record.
The compiler back-end starts with an instruction selection pass, which recognizes opportunities for using combined arithmetic instructions (add-immediate, not-and,
rotate-and-mask, etc.) and addressing modes provided
by the target processor. This pass proceeds by bottom-up
rewriting of Cminor expressions. The target language is
CminorSel, a processor-dependent variant of Cminor that
offers additional operators, addressing modes, and a class
figure 1: Compilation passes and intermediate languages.
Parsing, elaboration
(not verified)
Simplifications
type elimination
Stack pre-allocation
Clight
C#minor
Cminor
Branch tunneling
LCM CSE
Instruction
selection
LTLin
Code
linearization
LTL
Register
allocation
RTL
CFG
construction
CminorSel
Spilling, reloading
calling conventions
Instr. scheduling
Constant propagation
Linear
Layout of
stack frames
Mach
PowerPC code
generation
PPC
Assembling, linking
(not verified)