isn’t as good as the theoretical systems
that M/M/m models. Therefore, the utilization values at which your system’s
knees occur will be more constraining
than the values in Table 1. (I use the plural of values and knees, because you can
model your CPUs with one model, your
disks with another, your I/O controllers
with another, and so on.)
To recap:
˲ ˲ Each of the resources in your system has a knee.
˲ ˲ That knee for each of your resources is less than or equal to the knee value
you can look up in Table 1. The more
imperfectly your system scales, the
smaller (worse) your knee value will be.
˲ ˲ On a system with random request
arrivals, if you allow your sustained utilization for any resource on your system
to exceed your knee value for that resource, then you will have performance
problems.
˲ ˲ Therefore, it is vital that you manage your load so that your resource utilizations will not exceed your knee values.
capacity Planning
Understanding the knee can collapse
a lot of complexity out of your capacity
planning. It works like this:
˲ ˲ Your goal capacity for a given resource is the amount at which you can
comfortably run your tasks at peak
times without driving utilizations beyond your knees.
˲ ˲ If you keep your utilizations less
than your knees, your system behaves
roughly linearly—no big hyperbolic
surprises.
˲ ˲ If you are letting your system run
any of its resources beyond their knee
utilizations, however, then you have
performance problems (whether you
are aware of them or not).
˲ ˲If you have performance problems, then you don’t need to be spending your time with mathematical models; you need to be spending your time
fixing those problems by rescheduling
load, eliminating load, or increasing
capacity.
That’s how capacity planning fits
into the performance management
process.
Random arrivals
You might have noticed that I used the
term random arrivals several times.
Why is that important?
the reason
the knee value
is so important
on a system with
random arrivals
is that these tend
to cluster and
cause temporary
spikes in utilization.
Some systems have something that
you probably do not have right now:
completely deterministic job scheduling. Some systems—though rare these
days—are configured to allow service
requests to enter the system in absolute
robotic fashion, say, at a pace of one task
per second. And by this, I don’t mean at
an average rate of one task per second
(for example, two tasks in one second
and zero tasks in the next); I mean one
task per second, as a robot might feed
car parts into a bin on an assembly line.
If arrivals into your system behave
completely deterministically—
meaning that you know exactly when the next
service request is coming—then you
can run resource utilizations beyond
their knee utilizations without necessarily creating a performance problem.
On a system with deterministic arrivals,
your goal is to run resource utilizations
up to 100% without cramming so much
workload into the system that requests
begin to queue.
The reason the knee value is so important on a system with random
arrivals is that these tend to cluster and
cause temporary spikes in utilization.
These spikes need enough spare capacity to consume so that users don’t
have to endure noticeable queuing delays (which cause noticeable fluctuations in response times) every time a
spike occurs.
Temporary spikes in utilization
beyond your knee value for a given
resource are OK as long as they don’t
exceed a few seconds in duration. But
how many seconds are too many? I believe (but have not yet tried to prove)
that you should at least ensure that your
spike durations do not exceed eight
seconds. (You will recognize this number if you’ve heard of the “eight-second
rule.” 2) The answer is certainly that if
you’re unable to meet your percentile-based response time promises or your
throughput promises to your users,
then your spikes are too long.
coherency Delay
Your system does not have theoretically perfect scalability. Even if I have never studied your system specifically, it is
a pretty good bet that no matter what
computer application system you are
thinking of right now, it does not meet
the M/M/m “theoretically perfect scalability” assumption. Coherency delay is