figure 4. memory savings for mixed- 2. Difference engine saves
almost twice as much memory as eSX.
60
DE Shared
DE Patched
DE Compressed
DE total
50
Savings (%)
40
30
ESX aggressive 20
10
0
0 200 400
600 800
Time (s)
1000 1200 1400 1600
table 3. application performance under Difference engine for the
heterogeneous workload mixed- 1 is within 7% of the baseline.
Vim
compile,
lmbench (s)
RuBiS
response
time(ms)
620 1280
702 1268
Baseline
De
Kernel
compile (s)
RuBiS
requests
3149
3130
manner. One can certainly use the saved memory to create more
VMs, but does that increase the aggregate system performance?
To answer this question, we created four VMs with 650MB
of RAM each on a physical machine with 2.8GB of free
memory (excluding memory allocated to Domain-0). For the
baseline (without Difference Engine), Xen allocates memory statically. Upon creating all the VMs, there is clearly not
enough memory left to create another VM of the same configuration. Each VM hosts a RUBiS instance. For this experiment, we used the Java Servlets implementation of RUBiS.
There are two distinct client machines per VM to act as workload generators.
The goal is to increase the load on the system to saturation. The solid line in Figure 5 shows the total requests
served for the baseline, with the total offered load marked
on the X-axis. Beyond 960 clients, the total number of
requests served plateaus at around 180,000 while the
average response time (not shown) increases sharply. Upon
investigation, we find that for higher loads all of the VMs
have more than 95% memory utilization and some VMs actually start swapping to disk (within the guest OS). Using fewer
VMs with more memory (e.g., 2 VMs with 1.2GB RAM each)
did not improve the baseline performance for this workload.
Next, we repeat the same experiment with Difference
Engine, except this time we utilize reclaimed memory to create additional VMs. As a result, for each data point on the
X-axis, the per VM load decreases, while the aggregate offered
load remains the same. We expect that since each VM individually has lower load compared to the baseline, the system will
figure 5. up to a limit, Difference engine can help increase aggregate
system performance by spreading the load across extra Vms.
250,000
Total requests handled
200,000
150,000
100,000
Baseline 4VMs
DE 5VMs
DE 6VMs
DE 7VMs
600
50,000
800 1,000 1,200 1,400
Total offered load (# clients)
1,600
deliver better aggregate performance. The remaining lines
show the performance with up to three extra VMs. Clearly,
Difference Engine enables higher aggregate performance
compared to the baseline. However, beyond a certain point
(two additional VMs in this case), the overhead of managing the extra VMs begins to offset the performance benefits:
Difference Engine has to manage 4.5GB of memory on a system with 2.8GB of RAM to support seven VMs. In each case,
beyond 1,400 clients, the VM’s working set becomes large
enough to invoke the paging mechanism: we observe between
5,000 pages (for one extra VM) to around 20,000 pages (for
three extra VMs) being swapped out, of which roughly a fourth
get swapped back in.
6. concLuSion
One of the primary bottlenecks to higher degrees of virtual
machine multiplexing is main memory. Earlier work shows
that substantial memory savings are available from harvesting identical pages across virtual machines when running
homogeneous workloads. The premise of this work is that
there are significant additional memory savings available
from locating and patching similar pages and in-memory
page compression. We present the design and evaluation
of Difference Engine to demonstrate the potential memory
savings available from leveraging a combination of whole
page sharing, page patching, and compression. Our performance evaluation shows that Difference Engine delivers
an additional factor of 1. 6–2. 5 more memory savings than
VMware ESX server for a variety of workloads, with minimal
performance overhead. Difference Engine mechanisms
might also be leveraged to improve single OS memory management; we leave such exploration to future work.
acknowledgments
In the course of the project, we received invaluable assistance from a number of people at VMware. We would like to
thank Carl Waldspurger, Jennifer Anderson, and Hemant
Gaidhani, and the Performance Benchmark group for feedback and discussions on the performance of ESX server.
Also, special thanks are owed to Kiran Tati for assisting