Open Debate About Knees
in this article, i write about knees in
performance curves, their relevance,
and their application. whether it is even
worthwhile to try to define the concept
of knee, however, has been the subject of
debate going back at least 20 years.
there is significant historical basis to
the idea that the thing i have described
as a knee in fact is not really meaningful.
in 1988, Stephen Samson argued that,
at least for M/M/1 queuing systems,
no “knee” appears in the performance
curve. “the choice of a guideline number
is not easy, but the rule-of-thumb makers
go right on. in most cases there is not a
knee, no matter how much we wish to
find one,” wrote Samson. 3
the whole problem reminds me, as i
wrote in 1999, 2 of the parable of the frog
and the boiling water. the story says that
if you drop a frog into a pan of boiling
water, he will escape. But if you put a frog
into a pan of cool water and slowly heat
it, then the frog will sit patiently in place
until he is boiled to death.
with utilization, just as with boiling
water, there is clearly a “death zone,” a
range of values in which you can’t afford
to run a system with random arrivals. But
where is the border of the death zone? if
you are trying to implement a procedural
approach to managing utilization, you
need to know.
My friend neil Gunther (see http://
en.wikipedia.org/wiki/neil_J._Gunther
for more information about neil) has
debated with me privately that, first,
the term knee is completely the wrong
word to use here, in the absence of a
functional discontinuity. Second, he
asserts that the boundary value of . 5 for
an M/M/1 system is wastefully low, that
you ought to be able to run such a system
successfully at a much higher utilization
value than that. Finally, he argues that
any such special utilization value should
be defined expressly as the utilization
value beyond which your average
response time exceeds your tolerance for
average response time (Figure A). thus,
Gunther argues that any useful not-to-
exceed utilization value is derivable
only from inquiries about human
preferences, not from mathematics.
(See http://www.cmg.org/measureit/
issues/mit62/ m_62_15.html for more
information about his argument.)
the problem i see with this argument
is illustrated in Figure B. imagine that
your tolerance for average response
time is T, which creates a maximum
tolerated utilization value of ρt. notice
that even a tiny fluctuation in average
utilization near ρt will result in a huge
fluctuation in average response time.
i believe that your customers feel the
variance, not the mean. Perhaps they say
they will accept average response times
up to T, but humans will not be tolerant
of performance on a system when a 1%
change in average utilization over a one-
minute period results in, say, a tenfold
increase in average response time over
that period.
i do understand the perspective
that the knee values i’ve listed in
this article are below the utilization
values that many people feel safe in
exceeding, especially for lower-order
systems such as M/M/1. it is important,
however, to avoid running resources at
average utilization values where small
fluctuations in utilization yield too-large
fluctuations in response time.
having said that, i do not yet
have a good definition for a too-large
fluctuation. Perhaps, like response-
time tolerances, different people have
different tolerances for fluctuation. But
perhaps there is a fluctuation tolerance
factor that applies with reasonable
universality across all users. the Apdex
Application Performance index standard,
for example, assumes the response time
F at which users become “frustrated”
is universally four times the response
time T at which their attitude shifts from
being “satisfied” to merely “tolerating.” 1
the knee, regardless of how you
define it or what we end up calling it, is
an important parameter to the capacity-
planning procedure that i described
earlier in the main text of this article, and
i believe it is an important parameter
to the daily process of computer system
workload management.
i will keep studying.
References
1. apdex; http://www.apdex.org.
2. Millsap, c. Performance management: myths and
facts (1999); http://method-r.com.
3. samson, s. MVs performance legends. in
Proceedings of Computer Measurement Group
Conference (1988), 148–159.
figure a. Gunther’s maximum allowable utilization value ρ T is defined as the utilization
producing the average response time T.
m/m/1 System, T = 10
20
Response time (R)
15
10
5
0
0.0
0.5
utilization (ρ)
ρt = 0.900
figure B. near ρ T value, small fluctuations in average utilization result in large
response-time fluctuations.
m/m/8 System, T = 10
20
Response time (R)
15
10
5
0
0.0
ρt = 0.744997
ρt = 0.987