vide details about the current size of
a project. Even though there are multiple definitions of what constitutes
a line of code, such a metric can be
used to reason about whether the examined code base is complete or contains extraneous code such as copied-in libraries. To do this, however, the
metric should be placed in context,
bringing us to our first pitfall.
Metric in a bubble. Using a metric
without proper interpretation. Recognized by not being able to explain what
a given value of a metric means. Can be
solved by placing the metric inside a context with respect to a goal.
The usefulness of a single data
point of a metric is limited. Knowing
that a system is 100,000 LOC is meaningless by itself, since the number
alone does not explain if the system is
large or small. To be useful, the value
of the metric should, for example, be
compared against data points taken
from the history of the project or from
a benchmark of other projects. In the
first scenario, you can discover trends
that should be explained by external
events. For example, the graph in Figure 1 shows the LOC of a software system from January 2010 to July 2011.
The first question that comes to
mind here is: “Why did the size of the
system drop so much in July 2010?”
If the answer to this question is, “We
removed a lot of open source code
we copied in earlier,” then there is
no problem (other than the inclusion
of this code in the first place). If the
answer is, “We accidentally deleted
part of our code base,” then it might
be wise to introduce a different process of source-code version management. In this case the answer is that
an action was scheduled to drastically
reduce the amount of configuration
needed; given the amount of code that
was removed, this action was apparently successful.
Note that one of the benefits of placing metrics in context is that it allows
you to focus on the important part of
the graph. Questions regarding what
happened at a certain point in time
or why the value significantly deviates
from other systems become more important than the specific details about
how the metric is measured. Often
people, either on purpose or by accident, try to steer a discussion toward
to be useful,
the value of
the metric should
be compared
against data
points taken from
the history
of the project or
from a benchmark
of other projects.
“How is this metric measured?” instead of “What do these data points
tell me?” In most cases the exact construction of a metric is not important
for the conclusion drawn from the data.
For example, consider the three plots
shown in figures 2 and 3 representing different ways of computing the
volume of a system. Figure 2 shows
the lines of code counted as every
line containing at least one character
that is not a comment or white space
(blue) and lines of code counted as all
new line characters (orange). Figure 3
shows the number of files used.
The trend lines indicate that, even
though the scale differs, these volume metrics all show the same events.
This means that each of these metrics is a good candidate to compare
the volume of a system against other
systems. As long as the volume of the
other systems is measured in the same
manner, the conclusions drawn from
the data will be very similar.
The different trend lines bring up
a second question: “Why does the volume decrease after a period in which
the volume increased?” The answer
can be found in the normal way in
which alterations are made to this
particular system. When the volume
of the system increases, an action is
scheduled to determine whether new
abstractions are possible, which is
usually the case. This type of refactoring can significantly decrease the
size of the code base, which results in
lower maintenance effort and easier
ways to add functionality to the system. Thus, the goal here is to reduce
maintenance effort by (among others)
keeping the size of the code base relatively small.
In the ideal situation a direct relationship exists between a desired goal
(such as, reduced maintenance effort)
and a metric (such as, a small code
base). In some cases this relationship
is based on informal reasoning (for example, when the code base of a system
is small it is easier to analyze what the
system does); in other cases scientific
research has shown that the relationship exists. What is important here is
that you determine both the nature of
the relationship between the metric
and the goal (direct/indirect) and the
strength of this relationship (informal
reasoning/empirically validated).