from additional device compatibility or
remove the entire OS and sacrifice device compatibility by requiring hyper-visor-specific drivers for every device.
29
No matter how small the TCB is
made, sharing hardware requires a
software component to mandate access to the shared hardware. Being
both complex and highly privileged,
this software is a real concern for the
security of the system, an observation
that begs the question whether it is
really necessary to share hardware resources at all.
Researchers have argued that a static partitioning of system resources can
eliminate the virtualization platform
from the TCB altogether.
18 The virtualization platform traditionally orchestrates the booting of the system, multiplexing the virtual resources exposed
to virtual machines onto the available
physical resources. However, static
partitioning obviates the need for such
multiplexing in exchange for a loss of
flexibility in resource allocation. Partitioning physical CPUs and memory is
relatively straightforward; each virtual
machine is assigned a fixed number
of CPU cores and a dedicated region
of memory that is isolated using the
hardware support for virtualizing page
tables. Devices (such as network cards
and hard disks) pose an even greater
challenge since it is not reasonable to
dedicate an entire device for each virtual machine.
Fortunately, hardware virtualization
support is not limited to processors,
recently making inroads into devices
themselves. Single-root I/O virtualization (SR-IOV)
21 enables a single physical device to expose multiple virtual devices, each indistinguishable from the
original physical device. Each such virtual device can be allocated to a virtual
machine, with direct access to the device. All the multiplexing between the
virtual devices is performed entirely in
hardware. Network interfaces that support SR-IOV are increasingly popular,
with storage controllers likely to follow suit. However, while moving functionality to hardware does reduce the
amount of code to be trusted, there is
no guarantee the hardware is immune
to vulnerability or compromise.
Eliminating the hypervisor, while
attractive in terms of security, sacri-
fices several benefits that make virtual-
ated, binary translation modifies the
instruction stream of the entire OS to
be virtualized, then executes this modi-
fied code rather than the original OS
code.
Virtualizing memory requires a
complex page-table scheme called
“shadow page tables,”
7 an expansive,
extremely complicated process re-
quiring the hypervisor maintain page
tables for each process in a hosted
virtual machine. It also must monitor
any modifications to these page tables
to ensure isolation between different
virtual machines. Advances in proces-
sor technology render this functional-
ity moot by virtualizing both processor
and memory directly in hardware.
Some systems further reduce the
size of the TCB by splitting the func-
tionality of the virtualization platform
between a simple, low-level, system-
wide hypervisor, responsible for isola-
tion and security, and more complex,
per-tenant hypervisors responsible for
the remaining functionality of conven-
tional virtualization platforms.
29, 35 By
reducing the shared surface between
multiple VMs, such architectures help
protect against cross-tenant attacks.
In such systems, removing a large
commodity OS from the TCB presents
an unenviable trade-off; systems can
either retain the entire OS and benefit
Not all discovered vulnerabilities are exploitable; in fact, most exploits rely on chaining
together multiple vulnerabilities. In 2009, Kostya Korchinsky of Immunity Inc.,
presented an attack that gave an administrator within a virtual machine running on a
VMware hypervisor access to a physical host.
20
This is notable for two reasons: It affected the entire family of VMware products,
so both Workstation and ESX server were vulnerable, and it was reliable enough that
Canvas, Immunity’s commercially available penetration testing tool, included a
“cloudburst” mode to exploit systems and deploy different payloads. Rather than remain
an esoteric proof of concept, it was indeed a commercial exploit available to anyone.
The virtualization platform exposes virtual devices to guest machines through
device emulation. The device emulation layer runs as a user-mode process within
the host, acting as a translation and multiplexing layer between virtual and physical
devices. Cloudburst exploited multiple vulnerabilities in the emulated video card
interface to allow the guest arbitrary read-and-write access to host memory, giving it the
ability to corrupt random regions of memory.
The emulated video card accepts requests from the guest virtual machine through
a FIFO command queue and responds to these requests by updating a virtual frame
buffer. Both the queue and the frame buffer reside in the address space of the
emulation process on the host (vmware-vmx) but are shared with the video driver
in the guest. The rest of the process’s address space is private and should remain
inaccessible to the guest at all times.
SVGA_ CMD_ REC T_ COPY is an example of a request issued by the driver to the
emulator, specifying the (X, Y) coordinates and dimensions of a rectangle to be copied
along with the (X, Y) coordinates of the destination. The emulated device responds
by copying the appropriate regions, indexed relative to the start of the frame buffer.
However, due to incorrect boundary checking, the device is able to supply an extremely
large or even negative X or Y coordinate and read data from arbitrary regions of the
process’s address space. Unfortunately, due to stricter bounds checking around the
destination coordinates, arbitrary regions of process memory cannot be written to.
Emulating 3D operations requires the emulated device maintain some device state
or contexts. The contexts are stored as an array within the process but are not shared
with the guest, which requests updates to the contexts through the command queue.
The SVGA _ CMD_ SETRENDERSTATE command takes an index into the context array
and a value to be written at that location but does not perform bounds checking on
the value of the index, effectively allowing the guest to write to any region of process
memory, relative to the context array. This relative write can be further extended by
exploiting the SVGA _CMD_SETLIGH TENABLED command that reads a pointer from
a fixed location within the context and writes the requested value to the memory the
pointer references. These two vulnerabilities can be chained to achieve arbitrary
memory writes; as the referenced pointer lies within the context array, it is easily
modified by exploiting the SE TRENDERSTATE vulnerability.
When arbitrary reads and writes are possible, shell-code can be written into process
memory, then triggered by modifying a function pointer to reference the shell-code. As
no-execute protection prevents injected shell-code from being executed, the function
pointer must first call the appropriate memory protection functions to mark these
regions of memory as executable code pages; when this is done, however, the exploit
proceeds normally.
Anatomy of an Attack