fective, but proactively ensuring such
effectiveness is difficult and many
business practices (such as continually moving people on and off projects)
make us less effective though it is hard
to quantify just how much.
Time: Measuring Duration
One would think this would be very
easy—just look at the clock or the calendar. But some organizations do not
even record and retain information on
when their projects start and even when
they do what they measure may be quite
ambiguous. Imagine a project against
whose time recording system someone logged one hour of requirements
activity two years ago, while the project
team came on board yesterday. Did the
project “start” two years ago or did it
start yesterday? In this case the answer
is pretty clear. But suppose a year ago,
five people recorded a month’s worth
of work, then six months ago 10 per-son-months were recorded, and three
months ago 20 person months were recorded: now when did the project start?
Such a slow ramp-up is not unusual
and it means there simply was no one
discrete point in time when the project
“started.” And if we cannot define when
a project started, clearly we cannot say
how long it took to complete.
effort: Measuring Cost
Organizations routinely play with both
the effort recorded (declining to record
overtime, for instance) and with the ef-fort-to-cost ratio by not paying for this
overtime or by hiring cheaper offshore
resources. These practices introduce
significant variance in such measures,
even when they are actually taken.
What we really want
to measure is the
of the delivered
software but there
is no way to do that.
Reliability: Measuring Quality
This is a perennial challenge in our
business. Since defects are mostly
deemed to be “bad” there are personal
and organizational pressures that operate against consistent measurement.
Since a “defect” is simply the identification of knowledge that was not gained,
measuring defects suffers from some
of the same challenges as measuring
system size. In some ways, we can view
defects as key and valuable indicators
of the knowledge acquisition process.
But we still have a lot of variability in
how we define and measure them.
intrinsic and Artificial Variability
In measurement of software processes and products, there are two types
of variability: intrinsic and artificial.
Intrinsic variability simply exists in
the situation whereas artificial variability is a function of the artifice of
measurement. For example: Intrinsic
variability occurs in predicting the
size of a finished system in order to
estimate a project—before we build
a system we simply cannot know exactly how big it will be. We can make
an educated guess, we can compare
against similar system history, we
can extrapolate from what we have
seen before, but we do not know. Artificial variability occurs due to variation in the format of metrics and how
we collect them. If we count system
use cases as a size metric and one
person writes highly detailed requirements while another dashes off vague
and perfunctory products, what “use
case” means is quite different. The
knowledge content of one person’s
use case can vary a lot from the knowledge content of another person’s use
case. When recording time, if one
project counts its start as the very first
moment any time is recorded for any
activity, while another waits until everyone is on board the meaning of
“start” will be quite variable.
We do not have control over the
intrinsic variability—we cannot definitively know precisely how big a
system will be before we build it, for
instance—but we can and should
manage the artificial variability. Good
metrics programs do this: they apply
structure to requirements document
formats and density, they define usable and practical measures of time
We can usually
see when a project
team is effective
or ineffective, but
and effort, they carefully define what
a defect is and how severe it might be.
And they embrace the intrinsic uncertainty as something to be identified
and quantified rather than denied
and hidden away.
The fact that some things are not
particularly measurable does not
mean measurements are not useful.
The fact that there is uncertainty all
around us does not mean we should
pretend things are not uncertain.
There is a lot we can do to remove or
reduce artificial variability and one
of our primary reasons to do this is
to expose the irremovable intrinsic
uncertainty so we can measure it
and make important decisions based
upon it. To make metrics more usable and useful we have to strip off
this artificial variability and we do
this by looking at the metrics and the
metrics collection process and by applying some, well, control.
To reverse engineer the statements
of Lord Kelvin and Tom DeMarco:
if you do not improve it, you cannot
measure it very well and, to some extent, we cannot measure what we do
1. armour, P.G. not defect: the mature discipline of
testing. Commun. ACM 47, 10 (oct. 2004), 15–18.
2. armour, P.G. the conservation of uncertainty.
Commun. ACM 50, 9 (sept. 2007), 25–28.
3. deMarco, t. Controlling Software Projects. Prentice
Hall, upper saddle river, nJ, 1986.
4. Putnam, l.H. and Myers, W. Five Core Metrics. dorset
House, new york, ny, 2003, 33.
Phillip G. Armour ( firstname.lastname@example.org) is a senior
consultant at Corvus International Inc., deer Park, Il,
and a consultant at qsM Inc., Mclean, Va.
Copyright held by author.