did not try to incorporate all the latest advances for multicore
kernels into mC2.
Regarding specification, there are 450 lines of Coq code
(LOC) to specify the system calls (the topmost layer interface; see Table 1) and 943 LOC to specify the x86 hardware
machine model (the bottommost layer interface). These are
in our trusted computing base. We keep them small to limit
the room for errors and ease the review process.
Our assembly machine assumes strong sequential consistency for all atomic instructions. We believe our proof
should remain valid for the x86 TSO model because ( 1) all
our concurrent layers guarantee that nonatomic memory
accesses are properly synchronized; and ( 2) the TSO order
guarantees that all atomic synchronization operations are
properly ordered. Nevertheless, more formalization work
is needed to turn our proofs over sequential-consistent
machines into those over the TSO machines. 23
Also, our machine model only covers a small portion of
the x86 hardware features and cannot be used to verify some
kernel components, such as a bootloader, a PreInit module
(which initializes the CPUs and the devices), an ELF loader,
and some device drivers (e.g., disk driver). Their verification
is left for future work.
We also trust the Coq proof checker and the CompCertX
assembler for converting assembly into machine code.
3. CONCURRENT LAYER MACHINES
In this section, we explain the concurrent layer design principles, and show how to introduce per-CPU layer interfaces,
based on a multicore hardware machine model.
Πx86mc multicore hardware model allows arbitrary inter-
leavings at the level of assembly instructions. At each step, the
hardware nondeterministically picks one CPU and executes
the next assembly instruction on that CPU. Each assembly
instruction is classified as atomic, shared, or private, depend-
ing on the memory it accesses. One interleaving of an exam-
ple program running on two CPUs is:
atom1 shared1 shared1
private2 atom2 private1
The memory locations are logically categorized into two
kinds: the ones private to a single CPU/thread and the ones
shared by multiple CPUs/threads. Private memory accesses
do not need to be synchronized, whereas nonatomic shared
memory accesses need to be protected by some synchronization mechanisms (e.g., locks), which are normally implemented using atomic instructions (e.g., fetch-and-add).
With proper protection, each shared memory operation can
be viewed as if it were atomic.
The atomic object is an abstraction of some segment of
well-synchronized shared memory, combined with operations that can be performed over that segment. It consists
of a set of primitives, an initial state, and a logical log containing the entire history of the operations that were performed on the object during an execution schedule. Each
primitive invocation records a single corresponding event
in the log. For example, the above interleaving produces the
logical log We require that these events contain enough information so we can derive the current state
of each atomic object by replaying the entire log over the
object’s initial state.
As shown in Figure 4, a concurrent layer interface contains
both private objects (e.g., Oi) and atomic objects (e.g., Oj),
along with some invariants imposed on them. These objects
are verified by building certified concurrent layers via for-
ward simulations, which imply strong contextual refinement
Definition 1 (Contextual Refinement). We say that
machine contextually refines machine (written as
), if, and only if, for any P that does not get
stuck on , we also have that ( 1) P does not get stuck on ;
and ( 2) any observable behavior of P on is also observed
However, proving such contextual refinements directly on
a multicore, nondeterministic hardware model is difficult
Table 1. Verified system calls of the mC2 hypervisor kernel.
kernel_init, get_quota, send, recv, rz_spawn, spawn, sleep, yield,
wakeup, kill, getc, putc, get_tsc_per_ms, get_curid, vm_exit_info,
vm_mmap, vm_set_seg, vm_get_reg, vm_set_reg, vm_get_next_eip,
vm_inject_event, vm_check_int_shadow, vm_run, vm_check_pending_
event, vm_intercept_int_window, vm_get_tsc_offset, vm_set_tsc_offset,
Figure 4. The overlay interface L2 is a more abstract interface, built on top of the underlay interface L1, and implemented by private module Mi
and shared module Mj. Private objects in L2 only access the private memory of L1. Atomic objects are implemented by shared modules (e.g.,
Macq in Figure 1) that may access lower-level atomic objects (e.g., FAI_t), private objects, and shared memory. Memory regions of L1 accessed
by the layer implementation are hidden and replaced by newly introduced objects of L2. The simulation relation R is defined between these
memory regions and objects, for example, R1 in Section 2. Then, the certified concurrent layer L1R Mi ⊕ Mj : L2 can be built by proving the
forward simulation: whenever two states s1, s2 are related by R, and running any P over the layer machine based on L2 takes s2 to s2′ in one
step, then there exists s1′ such that running P ⊕ Mi ⊕ Mj over L1 takes s1 to s1′ in multiple steps, and s1′ and s2′ are also related by R.
Private Mem Private Obj Shared Mem
Shared Mem Private Obj
Atom Obj S2
R R RR