those pages that are unlikely to be accessed in the near
future; an efficient implementation of compression nicely
complements page sharing and patching.
In this paper, we present Difference Engine, an extension
to the Xen VMM5 that not only shares identical pages, but
also supports subpage sharing and in-memory compression of infrequently accessed pages. Our results show that
for a heterogeneous setup (different operating systems hosting different applications), Difference Engine can reduce
memory usage by nearly 65%. In head-to-head comparisons
against VMware’s ESX server running the same workloads,
Difference Engine delivers a factor of 1. 5 more memory savings for a homogeneous workload and a factor of 1. 6–2. 5
more memory savings for heterogeneous workloads.
Critically, we demonstrate that these benefits can be
obtained without negatively impacting application performance: in our experiments across a variety of workloads,
Difference Engine imposes less than 7% overhead. We further show that Difference Engine can leverage improved
memory efficiency to increase aggregate system performance by utilizing the free memory to create additional virtual machines in support of a target workload.
2. ReLateD WoRK
Difference Engine builds upon substantial previous work in
page sharing, delta encoding, and memory compression. In
each instance, we attempt to leverage existing approaches
where appropriate.
2. 1. Page sharing
Two common approaches in the literature for finding redundant pages are content-based page sharing, exemplified by
VMWare’s ESX server, 19 and explicitly tracking page changes
to build knowledge of identical pages, exemplified by “
transparent page sharing” in Disco. 7 Transparent page sharing
can be more efficient, but requires several modifications
to the guest OS, in contrast to ESX server and Difference
Engine which require no modifications.
To find sharing candidates, both Difference Engine and
ESX hash contents of each page and use hash collisions to
identify potential duplicates. Once shared, our system can
manage page updates in a copy-on-write fashion, as in Disco
and ESX server. We build upon earlier work on flash cloning18
of VMs, which allows new VMs to be cloned from an existing VM in milliseconds; as the newly created VM writes to
its memory, it is given private copies of the shared pages.
An extension by Kloster et al. studied page sharing in Xen11
and we build upon this experience, adding support for fully
virtualized (HVM) guests, integrating the global clock, and
improving the overall reliability and performance.
2. 2. Delta encoding
Our initial investigations into page similarity were inspired
by research in leveraging similarity across files in large file
systems. In GLIMPSE, 14 Manber proposed computing Rabin
fingerprints over fixed-size blocks at multiple offsets in a
file. Similar files will then share some fingerprints. Thus the
maximum number of common fingerprints is a strong indicator of similarity. However, in a dynamically evolving virtual
memory system, this approach does not scale well since
every time a page changes its fingerprints must be recomputed as well. Further, it is inefficient to find the maximal
intersecting set from among a large number of candidate
pages. Broder adapted Manber’s approach to the problem
of identifying documents (in this case, Web pages) that are
nearly identical using a combination of Rabin fingerprints
and sampling based on minimum values under a set of random permutations. 6
While these techniques can be used to identify similar
files, they do not address how to efficiently encode the differences. Douglis and Iyengar explored using Rabin fingerprints and delta encoding to compress similar files in the
DERD system, 10 but only considered whole files. Kulkarni
et al. 12 extended the DERD scheme to exploit similarity at the
block level. Difference Engine also tries to exploit memory
redundancy at several different granularities.
2. 3. memory compression
In-memory compression is not a new idea. Douglis et al. 9
implemented memory compression in the Sprite operating system with mixed results. In their experience, memory
compression was sometimes beneficial, but at other times
the performance overhead outweighed the memory savings.
Subsequently, Wilson et al. argued Douglis’ mixed results
were primarily due to slow hardware. 20 They also developed new compression algorithms (leveraged by Difference
Engine) that exploit the inherent structure present in virtual memory, whereas earlier systems used general-purpose
compression algorithms. Tuduce et al. 17 implemented a
compressed cache for Linux that adaptively manages the
amount of physical memory devoted to compressed pages
using a simple algorithm shown to be effective across a wide
variety of workloads.
3. aRchitectuRe
Difference Engine uses three distinct mechanisms that
work together to realize the benefits of memory sharing, as
shown in Figure 1. In this example, two VMs have allocated
five pages total, each initially backed by distinct pages in
machine memory (Figure 1a). For brevity, we only show how
the mapping from guest physical memory to machine memory changes; the guest virtual to guest physical mapping
remains unaffected. First, for identical pages across the
VMs, we store a single copy and create references that point
to the original. In Figure 1b, one page in VM- 2 is identical to
one in VM- 1. For pages that are similar, but not identical, we
store a patch against a reference page and discard the redundant copy. In Figure 1c, the second page of VM- 2 is stored as
a patch to the second page of VM- 1. Finally, for pages that
are unique and infrequently accessed, we compress them in
memory to save space. In Figure 1d, the remaining private
page in VM- 1 is compressed. The actual machine memory
footprint is now less than three pages, down from five pages
originally.
In all three cases, efficiency concerns require us to select
candidate pages that are unlikely to be accessed in the near
future. We employ a global clock that scans memory in the
background, identifying pages that have not been recently