no load. As load ramps up, you sense a
slight, gradual degradation in response
time. That gradual degradation does
not really do much harm, but as load
continues to ramp up, response time
begins to degrade in a manner that’s
neither slight nor gradual. Rather, the
degradation becomes quite unpleasant
and, in fact, hyperbolic.
Response time (R), in the perfect scalability M/M/m model, consists of two
components: service time (S) and
queuing delay (Q), or R = S + Q. Service time
is the duration that a task spends consuming a given resource, measured in
time per task execution, as in seconds
per click. Queuing delay is the time
that a task spends enqueued at a given
resource, awaiting its opportunity to
consume that resource. Queuing delay
is also measured in time per task execution (for example, seconds per click).
In other words, when you order
lunch at Taco Tico, your response time
(R) for getting your order is the queuing
delay time (Q) that you spend in front
of the counter waiting for someone to
take your order, plus the service time
(S) you spend waiting for your order to
hit your hands once you begin talking
to the order clerk. Queuing delay is the
difference between your response time
for a given task and the response time
for that same task on an otherwise unloaded system (don’t forget our perfect
scalability assumption).
table 1. m/m/m knee values for common
values of m.
Service
channel count
1
2
4
8
16
32
64
128
Knee
utilization
50%
57%
66%
74%
81%
86%
89%
92%
throughput is maximized with minimal negative impact to response times.
(I am engaged in an ongoing debate
about whether it is appropriate to use
the term knee in this context. For the
time being, I shall continue to use it;
see the sidebar for details.) Mathematically, the knee is the utilization value
at which response time divided by utilization is at its minimum. One nice
property of the knee is it occurs at the
utilization value where a line through
the origin is tangent to the response-time curve. On a carefully produced
M/M/m graph, you can locate the knee
quite nicely with just a straight-edge,
as shown in Figure 2.
Another nice property of the M/M/m
knee is that you need to know the val-
ue of only one parameter to compute
it. That parameter is the number of
parallel, homogeneous, independent
service channels. A service channel is
a resource that shares a single queue
with other identical resources, such
as a booth in a toll plaza or a CPU in
an SMP (symmetric multiprocessing)
computer.
the Knee
When it comes to performance, you
want two things from a system:
˲ ˲ The best response time you can get:
you do not want to have to wait too long
for tasks to get done.
˲ ˲ The best throughput you can get:
you want to be able to cram as much
load as you possibly can onto the system so that as many people as possible
can run their tasks at the same time.
Unfortunately, these two goals are
contradictory. Optimizing to the first
goal requires you to minimize the load
on your system; optimizing to the second goal requires you to maximize it.
You can not do both simultaneously.
Somewhere in between—at some load
level (that is, at some utilization value)—is the optimal load for the system.
The utilization value at which
this optimal balance occurs is called
the knee. This is the point at which
Relevance of the Knee
How important can this knee concept
be, really? After all, as I’ve told you, the
M/M/m model assumes this ridiculously utopian idea that the system you are
thinking about scales perfectly. I know
what you are thinking: it doesn’t.
What M/M/m does give us is the
knowledge that even if your system did
scale perfectly, you would still be stricken with massive performance problems
once your average load exceeded the
knee values in Table 1. Your system
figure 2. the knee occurs at the utilization at which a line through the origin is tangent to
the response time curve.
m/m/4, ρ8 = 0.665006
m/m/16, ρ8 = 0.810695
10
Response time (R)
8
6
4
2
0
0.0
0.2
0.6 0.4
utilization (ρ)
0.8