space has now shrunk to one-fourth of
what it was—each tenant now receives
only four megabytes of L2 cache and
must compete with three other tenants
for all the same resources it had before.
In modern computing, cache is king,
and if your cache is cut, you are going
to feel it, as you did when trying to fix
your performance problems.
Most cloud providers offer systems
that are non-elastic, as well as elastic,
but having a server always available in
a cloud service is more expensive than
hosting one at a traditional colocation
facility. Why is that? It is because the
economies of scale for cloud providers work only if everyone is playing the
game and allowing the cloud provider
to dictate how resources are consumed.
Some providers now have something
called Metal-as-a-Service, which I really
think ought to mean a 1980s-era metal
band shows up at your office, plays a
gig, and smashes the furniture—but
alas, it is just the cloud providers’ way
of finally admitting cloud computing
is not really the right answer for all applications. For systems that require
deterministic performance guarantees
to work well, you really must think very
hard about whether or not a cloud-based system is the right answer, because providing deterministic guarantees requires quite a bit of control
over the variables in the environment.
Cloud systems are not about giving you
control; they are about the owner of the
systems having the control.
KV
Related articles
on queue.acm.org
Cloud Calipers
Kode Vicious
https://queue.acm.org/detail.cfm?id=2993454
20 Obstacles to Scalability
Sean Hull
https://queue.acm.org/detail.cfm?id=2512489
A Guided Tour through
Data-center Networking
Dennis Abts and Bob Felderman
https://queue.acm.org/detail.cfm?id=2208919
George V. Neville-Neil ( kv@acm.org) is the proprietor of
Neville-Neil Consulting and co-chair of the ACM Queue
editorial board. He works on networking and operating
systems code for fun and profit, teaches courses on
various programming-related subjects, and encourages
your comments, quips, and code snips pertaining to his
Communications column.
Copyright held by author.
in CPU caches, and data on storage. In
a traditional server, all these resources
are controlled by your operating system
at the behest of the programs running
on top of the operating system; but in a
cloud, there is another layer, the virtual
machine, which adds another turtle to
the stack, and even when it is turtles all
the way down, that extra turtle is going
to be the source of resource variation.
This is one reason you saw inconsistent results after you moved your system to the cloud.
Let’s think only about the use of CPU
caches for a moment. Modern CPUs
gain quite a bit of their overall performance from having large, efficiently
managed L1, L2, and sometimes L3
caches. The CPU caches are shared
among all programs, but in the case of a
virtualized system with several tenants,
the amount of cache available to any
one program—such as your database or
Memcached server—decreases linearly
with the addition of each tenant. If you
had a beefy server in your original colocation facility, you were definitely gaining a performance boost from the large
caches in those CPUs. The very same
server running in a cloud provider is
going to give your programs drastically
less cache space with which to work.
With less cache, fewer things are
kept in fast memory, meaning your
programs now need to go to regular
RAM, which is often much slower than
cache. Those accesses to memory are
now competing with other tenants that
are also squeezed for cache. Therefore,
although the real server on which the
instances are running might be much
larger than your original hardware—
perhaps holding nearly a terabyte of
RAM—each tenant receives far worse
performance in a virtual instance of
the same memory size than it would
if it had a real server with the same
amount of memory.
Let’s imagine this with actual numbers. If your team owned a modern du-al-processor server with 128 gigabytes
of RAM, each processor would have
16 megabytes—not gigabytes—of L2
cache. If that server is running an operating system, a database, and Memcached, then those three programs
share that 16 megabytes. Taking the
same server and increasing the memory to 512 gigabytes, and then having
four tenants, means the available cache
For further information
or to submit your
manuscript,
visit tsas.acm.org
ACM Transactions
on Spatial Algorithms
and Systems
ACM TSAS is a new
scholarly journal that
publishes high-quality
papers on all aspects of
spatial algorithms and
systems and closely
related disciplines. It
has a multi-disciplinary
perspective spanning
a large number of
areas where spatial
data is manipulated or
visualized.
The journal is
committed to the
timely dissemination
of research results
in the area of spatial
algorithms and systems.