Doi: 10.1145/1538788.1538814
Formal Verification
of a Realistic Compiler
By Xavier Leroy
Abstract
This paper reports on the development and formal verification (proof of semantic preservation) of CompCert, a
compiler from Clight (a large subset of the C programming
language) to PowerPC assembly code, using the Coq proof
assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the
context of critical software and its formal verification: the
verification of the compiler guarantees that the safety properties proved on the source code hold for the executable
compiled code as well.
1. in TRoDuCTion
Can you trust your compiler? Compilers are generally
assumed to be semantically transparent: the compiled
code should behave as prescribed by the semantics of the
source program. Yet, compilers—and especially optimizing
compilers—are complex programs that perform complicated symbolic transformations. Despite intensive testing,
bugs in compilers do occur, causing the compilers to crash
at compile-time or—much worse—to silently generate an
incorrect executable for a correct source program.
For low-assurance software, validated only by testing,
the impact of compiler bugs is low: what is tested is the
executable code produced by the compiler; rigorous testing
should expose compiler-introduced errors along with errors
already present in the source program. Note, however, that
compiler-introduced bugs are notoriously difficult to expose
and track down. The picture changes dramatically for safety-critical, high-assurance software. Here, validation by testing reaches its limits and needs to be complemented or
even replaced by the use of formal methods such as model
checking, static analysis, and program proof. Almost universally, these formal verification tools are applied to the source
code of a program. Bugs in the compiler used to turn this
formally verified source code into an executable can potentially invalidate all the guarantees so painfully obtained by
the use of formal methods. In future, where formal methods
are routinely applied to source programs, the compiler could
appear as a weak link in the chain that goes from specifications to executables. The safety-critical software industry is
aware of these issues and uses a variety of techniques to alleviate them, such as conducting manual code reviews of the
generated assembly code after having turned all compiler
optimizations off. These techniques do not fully address the
issues, and are costly in terms of development time and program performance.
An obviously better approach is to apply formal methods to the compiler itself in order to gain assurance that it
preserves the semantics of the source programs. For the last
5 years, we have been working on the development of a
realistic, verified compiler called CompCert. By verified, we mean
a compiler that is accompanied by a machine-checked proof
of a semantic preservation property: the generated machine
code behaves as prescribed by the semantics of the source
program. By realistic, we mean a compiler that could realistically be used in the context of production of critical software.
Namely, it compiles a language commonly used for critical
embedded software: neither Java nor ML nor assembly code,
but a large subset of the C language. It produces code for a
processor commonly used in embedded systems: we chose
the PowerPC because it is popular in avionics. Finally, the
compiler must generate code that is efficient enough and
compact enough to fit the requirements of critical embedded systems. This implies a multipass compiler that features
good register allocation and some basic optimizations.
Proving the correctness of a compiler is by no ways a
new idea: the first such proof was published in 196716 (for
the compilation of arithmetic expressions down to stack
machine code) and mechanically verified in 1972.17 Since
then, many other proofs have been conducted, ranging from
single-pass compilers for toy languages to sophisticated
code optimizations. 8 In the CompCert experiment, we carry
this line of work all the way to end-to-end verification of a
complete compilation chain from a structured imperative
language down to assembly code through eight intermediate
languages. While conducting the verification of CompCert,
we found that many of the nonoptimizing translations performed, while often considered obvious in the compiler literature, are surprisingly tricky to formally prove correct.
This paper gives a high-level overview of the CompCert
compiler and its mechanized verification, which uses the
Coq proof assistant. 3, 7 This compiler, classically, consists of
two parts: a front-end translating the Clight subset of C to a
low-level, structured intermediate language called Cminor,
and a lightly optimizing back-end generating PowerPC
assembly code from Cminor. A detailed description of Clight
can be found in Blazy and Leroy5; of the compiler front-end
in Blazy et al. 4; and of the compiler back-end in Leroy. 11, 13
The complete source code of the Coq development, extensively commented, is available on the Web. 12
The remainder of this paper is organized as follows.
Section 2 compares and formalizes several approaches to
establishing trust in the results of compilation. Section 3
A previous version of this paper was published in
Proceedings of the 33rd Symposium on the Principles of
Programming Languages. ACM, NY, 2006.