6. RELATED WORK
Dijkstra9 proposed to “realize” a complex program by
decomposing it into a hierarchy of linearly ordered abstract
machines. Based on this idea, Gu et al. 10 developed new
languages and tools for building certified abstraction layers with deep specifications, and showed how to apply the
layered methodology to construct certified (sequential) OS
kernels in Coq. Costanzo et al. 8 showed how to prove security properties over a deep specification of a certified OS
kernel, and then propagate these properties from the specification level to its correct assembly-level implementation.
Chen et al. 5 extended the layer methodology to build certified kernels and device drivers running on multiple
logical CPUs. They treated the driver stack for each device as if
it were running on a logical CPU dedicated to that device.
Logical CPUs do not share any memory, and are all eventually mapped onto a single physical CPU. None of these systems, however, can support shared-memory concurrency
with fine-grained locking.
The seL4 team17 was the first to verify the functional correctness and security properties of a high-performance
L4-family microkernel. The seL4 microkernel, however, does
not support multicore concurrency with fine-grained locking. Peters et al. 22 and von Tessin23 argued that for an seL4-
like microkernel, concurrent data accesses across multiple
CPUs can be reduced to a minimum, so a single big kernel
lock (BKL) might be good enough for achieving good performance on multicore machines. von Tessin23 further showed
how to convert the single-core seL4 proofs into proofs for a
BKL-based clustered multikernel.
The Verisoft team1 applied the VCC framework7 to formally verify Hyper-V, which is a widely deployed multiprocessor hypervisor consisting of 100 kLOC of C code and 5
kLOC of assembly. However, only 20% of the code is verified7; it is also only verified for function contracts and type
invariants, rather than the full functional correctness property. CIVL14 uses the state-machine approach with support
for atomic actions and movers to reduce the proof burden
for concurrent programs. It is implemented as an extension to Boogie and has been used to verify a concurrent
garbage collector. However, CIVL can only be used to reason about safety rather than liveness. There is a large body of
other work6, 13, 24 showing how to build verified OS kernels,
(transactions) on each round.
Figure 8a shows the results of our latency measurement.
In the single-core case, ticket locks impose 34 cycles of
overhead, whereas MCS locks impose 74 cycles as shown
in the line chart. As the number of cores grows, the latency
increases rapidly. As the slowdown should be proportional to
the number of cores, to show the actual efficiency of the lock
implementations, we normalize the latency against the baseline (single core) multiplied by the number of cores (n*t1/tn). As
can be seen from the bar chart, efficiency remains about the
same for MCS locks, but decreases for ticket locks.
Now that we have compared MCS locks with ticket locks,
we present the remaining evaluations in this section using
only the ticket lock implementation of mC2.
5. 5. Hypervisor performance
To evaluate mC2 as a hypervisor, we measured the performance of some macro benchmarks on Ubuntu 12.04.2 LTS
running as a guest. We ran the benchmarks on Linux as
guest in both KVM and mC2, as well as on the bare metal.
The guest Ubuntu is installed on an internal SSD drive. KVM
and mC2 are installed on a USB stick. We use the standard
4KB pages in every setting—huge pages are not used.
Figure 8b contains a compilation of standard macro
benchmarks: unpacking a Linux 4.0-rc4 kernel archive,
compiling the Linux 4.0-rc4 kernel source, running Apache
HTTPerf on loopback, and the DaCaPo Benchmark 9. 12. We
normalize the running times of the benchmarks using the
bare metal performance as a baseline (100%). The overhead
of mC2 is moderate and comparable to KVM. In some cases,
mC2 performs better than KVM; we suspect this is because
KVM has a Linux host and thus has a larger cache footprint.
For benchmarks with a large number of file operations, such
as Uncompress Linux source and Tomcat, mC2 performs
worse. This is because mC2 exposes the raw disk interface
to the guest via VirtIO (instead of passing it through), and its
disk driver does not provide good buffering support.
MCS lock ticket lock MCS lock ticket lock
DaCa Po Benchmark Set UncompressLinux CompileLinux ApacheHTTPerf AvroraBatik EclipseFop H2 Jython Luindex LusearchPMD Sunflow Tomcat Tradebeans TradesoapXalan
Figure 8. (a) The comparison between actual efficiency of ticket lock and MCS lock implementations in mC2; (b) normalized performance for
macro benchmarks running over Linux on KVM versus Linux on mC2; the baseline is Linux on bare metal; a smaller ratio is better.