works, not that the user-level policy makes sense. The mechanism works if it keeps kernel code and data structures safe
from user access, if the virtual memory (VM) subsystem is
fully controlled by the kernel interface via capabilities, and if
it provides the necessary functionality for user level to manage its own VM policies.
Obviously, moving policy into user land does not change
the fact that memory allocation is part of the TCB. It does
mean, however, that memory allocation can be verified separately, and can rely on verified kernel properties.
The memory-management model gives free memory
to the user-level manager in the form of regions tagged as
untyped. The memory manager can split untyped regions
and retype them into one of several kernel object types (one
of them, frame, is for user-accessible memory); such operations create new capabilities. Object destruction converts a
region back to untyped (and invalidates derived capabilities).
Before reusing a block of memory, all references to this
memory must be invalidated. This involves either finding all outstanding capabilities to the object, or returning
the object to the memory pool only when the last capability is deleted. Our kernel uses both approaches. In the first
approach, a so-called capability derivation tree is used to
find and invalidate all capabilities referring to a memory
region. In the second approach, the capability derivation
tree is used to ensure, with a check that is local in scope, that
there are no system-wide dangling references. This is possible because all other kernel objects have further invariants
on their own internal references that relate back to the existence of capabilities in this derivation tree.
Similar book-keeping would be necessary for a traditional malloc/free model in the kernel. The difference is that
the complicated free case in our model is concentrated in
one place, whereas otherwise it would be repeated numerous times over the code.
Concurrency and nondeterminism: Concurrency is the
execution of computation in parallel (in the case of multiple
hardware processors), or by nondeterministic interleaving
via a concurrency abstraction like threads. Reasoning about
concurrent programs is hard, much harder than reasoning
about sequential programs. For the time being, we limited
the verification to a single-processor version of seL4.
In a uniprocessor kernel, concurrency can result from
three sources: yielding of the processor from one thread to
another, (synchronous) exceptions and (asynchronous)
interrupts. Yielding can be synchronous, by an explicit handover,
such as when blocking on a lock, or asynchronous, by pre-emption (but in a uniprocessor kernel, the latter can only
happen as the result of an interrupt).
We limit the effect of all three by a kernel design which
explicitly minimizes concurrency.
Exceptions are completely avoided, by ensuring that they
never occur. For instance, we avoid virtual-memory exceptions by allocating all kernel data structures in a region of
VM which is always guaranteed to be mapped to physical
memory. System-call arguments are either passed in registers or through preregistered physical memory frames.
The complexity of synchronous yield we avoid by using
an event-based kernel execution model, with a single kernel
stack, and a mostly atomic application programming interface. This is aided by the traditional L4 model of system calls
which are primitive and mostly short-running.
We minimize the effect of interrupts (and hence pre-emptions) by disabling interrupts during kernel execution.
Again, this is aided by the L4 model of short system calls.
However, not all kernel operations can be guaranteed to
be short; object destruction especially can require almost
arbitrary execution time, so not allowing any interrupt processing during a system call would rule out the use of the
kernel for real-time applications, undermining the goal of
real-world deployability.
We ensure bounded interrupt latencies by the standard
approach of introducing a few, carefully placed, interrupt
points. On detection of a pending interrupt, the kernel
explicitly returns through the function call stack to the ker-nel/user boundary and responds to the interrupt. It then
restarts the original operation, including reestablishing all
the preconditions for execution. As a result, we completely
avoid concurrent execution in the kernel.
i/o: Interrupts are used by device drivers to affect I/O. L4
kernels traditionally implement device drivers as user-level
programs, and seL4 is no different. Device interrupts are
converted into messages to the user-level driver.
This approach removes a large amount of complexity
from the kernel implementation (and the proof). The only
exception is an in-kernel timer driver which generates timer
ticks for scheduling, which is straightforward to deal with.
4. VeRification of seL4
This section gives an overview of the formal verification of
seL4 in the theorem prover Isabelle/HOL. 8 The property
we are proving is functional correctness. Formally, we are
showing refinement: A refinement proof establishes a correspondence between a high-level (abstract) and a low-level
(concrete, or refined) representation of a system.
The correspondence established by the refinement proof
ensures that all Hoare logic properties of the abstract model
also hold for the refined model. This means that if a security
property is proved in Hoare logic about the abstract model
(not all security properties can be), our refinement guarantees
that the same property holds for the kernel source code. In this
paper, we concentrate on the general functional correctness
property. We have also modelled and proved the security of
seL4’s access-control system in Isabelle/HOL on a high level. 3
Figure 2 shows the specification layers used in the verification of seL4; they are related by formal proof. In the following sections we explain each layer in turn.
4. 1. abstract specification
The abstract level describes what the system does without
saying how it is done. For all user-visible kernel operations,
it describes the functional behavior that is expected from
the system. All implementations that refine this specification will be binary compatible.
We precisely describe argument formats, encodings and
error reporting, so, for instance, some of the C-level size
restrictions become visible on this level. We model finite
machine words, memory, and typed pointers explicitly.