services as a set of interacting distributed components. We propose a new
“deconstructed OS” called Tessellation
structured around space-time partitioning and two-level scheduling between
the operating system and application
runtimes. Tessellation implements
scheduling and resource management
at the partition granularity. Applications and OS services (such as file systems) run within their own partitions.
Partitions are lightweight and can be
resized or suspended with similar overheads to a process-context swap.
A key tenet of our approach is that
resources given to a partition are either
exclusive (such as cores or private caches) or guaranteed via a quality-of-service
contract (such as a minimum fraction
of network or memory bandwidth).
During a scheduling quantum, the application runtime within a partition
is given unrestricted “bare metal” access to its resources and may schedule
tasks onto them in some way. Within
a partition, our approach has much in
common with the Exokernel. 11 In the
common case, we expect many application runtimes to be written as libraries (similar to libOS). Our Tessellation
kernel is a thin layer responsible for
only the coarse-grain scheduling and
assignment of resources to partitions
and implementation of secure restricted communications among partitions.
The Tessellation kernel is much thinner than traditional kernels or even
hypervisors. It avoids many of the performance issues associated with traditional microkernels by providing OS
services through secure messaging to
spatially co-resident service partitions,
rather than context-switching to time-multiplexed service processes.
Par Lab hardware tower. Past parallel
projects were often driven by the hardware determining the application and
software environment. The Par Lab is
driven top down from the applications,
so the question this time is what should
architects do to help with the goals of
productivity, efficiency, correctness,
portability, and scalability?
Here are four examples of this kind
of help that illustrate our approach:
Supporting OS partitioning. Our hardware architecture enforces partitioning of not only the cores and on-chip/
off-chip memory but also the communication bandwidth among these com-
to save the
number of cores.
ponents, providing quality-of-service
guarantees. The resulting performance
predictability improves parallel program performance, simplifies code autotuning and dynamic load balancing,
supports real-time applications, and
Optional explicit control of the memory hierarchy. Caches were invented so
hardware could manage a memory hierarchy without troubling the programmer. When it takes hundreds of clock
cycles to go to memory, programmers
and compilers try to reverse-engineer
the hardware controllers to make better use of the hierarchy. This backward
situation is especially apparent for
hardware prefetchers when programmers try to create a particular pattern
that will invoke good prefetching. Our
approach aims to allow programmers
to quickly turn a cache into an explicitly
managed local store and the prefetch
engines into explicitly controlled Direct Memory Access engines. To make it
easy for programmers to port software
to our architecture, we also support a
traditional memory hierarchy. The low-overhead mechanism we use allows
programs to be composed of methods
that rely on local stores and methods
that rely on memory hierarchies.
Accurate, complete counters of performance and energy. Sadly, performance
counters on current single-core computers often miss important measurements (such as prefetched data) or are
unique to a computer and only understandable by the machine’s designers.
We will include performance enhancements in the Par Lab architecture only
if they have counters to measure them
accurately and coherently. Since energy
is as important as performance, we also
include energy counters so software can
improve both. Moreover, these counters must be integrated with the software stack to provide insightful measurements to the efficiency-layer and
productivity-layer programmers. Ideally, this research will lead to a standard
for performance counters so schedulers and software development kits can
count on them on any multicore.
Intuitive performance model. The
multicore diversity mentioned earlier
exacerbates the already difficult jobs
performed by programmers, compiler
writers, and architects. Hence, we developed an easy-to-understand visual