register (EIP) and stack pointer register (ESP) are saved and
restored in this procedure, this kernel context switch function does not satisfy the C calling convention and has to be
verified at the assembly level. Based on this context switch
function and the shared queue library, we can verify three
scheduling primitives: yield, sleep, and wakeup (see Figure 7).
Thread-local machine models can be built based on the
thread management layers. The first step is to extend the
environment context with a software scheduler (i.e., abstracting the concrete scheduling procedure), resulting in a new
environment context εss. The scheduling primitives generate the and events. εss responds
with the next thread ID to execute. The second step is to
introduce the active thread set to represent the active threads
on the active CPU, and extend εss with the context threads,
that is, the rest of the threads running on the active CPU.
The composition structure is similar to the one of Lemma 3.
In this way, higher layers can be built upon a thread-local
machine with a single active thread on the active CPU (see
Condition variable (CV) is a synchronization object
that enables a thread to wait for a change to be made to a
shared state. Standard Mesa-style CVs18 do not guarantee starvation-freedom: a thread waiting on a CV may not
be signaled within a bounded number of execution steps.
We have implemented a starvation-free CV using condition queues as shown by the popular, most up-to-date OS
textbook. 2 However, we have found a bug in the FIFOBBQ
implementation as shown in that textbook. Their system
can get stuck in two cases: ( 1) when the destroyed CV is kept
inside the remove queue (rmvQ), which will block the insert
call to wake up the proper waiter; ( 2) when multiple CVs are
woken up within a short period and the lock-holding CV
thread is not the head of rmvQ, that thread will be removed
from rmvQ and return to sleep, but will never be woken up
again. We fixed this issue by postponing the removal of the
CV thread from rmvQ, until woken thread that is allowed
to proceed finishes its work; this thread is now responsible
for removing itself from rmvQ, as well as waking up the next
thread in rmvQ.
5. 1. Proof effort and cost of change
Overall, our certified mC2 kernel consists of 6500 lines of C
and x86 assembly. The concurrency extensions were completed in about two person-years. The new concurrency
framework (to specify, build, and link certified concurrent
abstraction layers) took about one person-year to develop.
We extended the certified sequential mCertiKOS kernel5, 8, 10
(which took another two person-years to develop in total)
with various features, such as dynamic memory management, container support for controlling resource consumption, Intel hardware virtualization support, shared memory
IPC, two-copy synchronous IPC, ticket and MCS locks, new
schedulers, condition variables, etc. Some of these features
were initially added in the sequential setting but later ported
to the concurrent setting. The verification of these features
was completed around one person-year. During this development process, many of our certified layers underwent many
modifications and extensions. The CertiKOS framework
allows such incremental development to take place much
more smoothly. For example, certified layers in the sequential kernel can be directly ported to the concurrent setting if
they only access private state. We have also adapted the work
by Chen et al. 5 on interruptible kernels with device drivers to
our multicore model.
Regarding the proof effort, there are 5249 lines of additional specifications for the various kernel functions, and
about 40K LOC used to define auxiliary definitions, lemmas,
theorems, and invariants. Additionally, there are 50K lines
of proof scripts for proving the newly added concurrency
5. 2. Bugs found
Other than the FIFOBBQ bug, we have also found a few other
bugs during verification. Our initial ticket-lock implementation contains a particularly subtle bug: the spinning loop body
(line 15 in Figure 1) was implemented as while(get_n() < my_t).
This passed all our tests, but during the verification, we found
that it did not satisfy the atomic specification as the ticket
field might overflow. For example, if the next ticket number t
is (232− 1), an overflow will occur in acq (line 14 in Figure 1) and
the returned ticket my_t will equal to 0. In this case, current-serving number n is not less than my_t and acq gets the lock
immediately, violating the mutual exclusion property.
5. 3. Performance evaluation
Although performance is not the main emphasis of this
work, we have run a number of micro and macro benchmarks to measure the speedup and overhead of mC2, and to
compare mC2 with existing systems such as KVM and seL4.
All experiments have been performed on a machine with
one Intel Xeon CPU with four cores running at 2. 8 GHz. As
the power control code has not been verified, we disabled
the turbo boost and power management features of the
hardware during experiments.
5. 4. Concurrency overhead
The runtime overhead introduced by concurrency in mC2
mainly comes from the latency of spinlocks.
sleep / yield 2
Running Thread CPU0 1
Figure 7. Each CPU has a private ready queue ReadyQ and a shared
pending queue PendQ. The environmental CPUs can insert threads
to the current CPU’s PendQ. The mC2 kernel also provides a set of
shared sleeping queues SleepQs. The yield primitive moves a thread
from PendQ to ReadyQ and then switches to the next ready thread.
The sleep primitive simply adds the running thread to a SleepQ
and runs the next ready thread. The wakeup primitive contains two
cases. If the thread to be woken up belongs to the current CPU, it
will be added to the corresponding ReadyQ. Otherwise, the thread is
added to PendQ of the CPU it belongs to.