Code
Spelunking
Redux
growth rate, then the number of potential connections
grows from 4,950 to 19,900, a 302 percent growth rate.
One reliable measure of the number of interfaces into
a system is the number of system calls provided to user
programs by an operating-system kernel. Since the publication of my first article on code spelunking, the Linux
kernel has grown from just shy of 200 system calls to 313,
an increase of more than 50 percent (see table 1). 6
means more lines of code can now be run in the same
amount of time. Available memory gets larger, so we can
now keep more state or code in memory. Disks get larger
and require less power (in the case of flash), and suddenly
we’re able to carry around what were once considered
huge amounts of data in our pockets. What was termed
the “software crisis” in the 1970s has never really abated,
because each time software engineers came up with a new
way of working that reduced complexity, the industry
moved forward and demanded more.
Complexity increases in many directions, including lines of code, numbers of modules, and numbers of
systems and subsystems. More complex systems require
more lines of code to implement. As they grow their
systems, software teams often integrate more code from
outside resources, which leads to complex interactions
between systems that may not have been designed with
massive integration in mind.
These numbers should not be surprising to any software engineer, but they are a cause for concern. Although
it was unlikely that the numbers would shrink, all but
one of them has grown by more than 50 percent, and
although the number of lines may have grown linearly,
the interactions among
the new components that
these numbers represent
have not grown in a linear
fashion. If we assume that
all modules in a system
can interact freely with all
other modules, then we
Apache Web Server
have a system in which
the potential number of
Emacs
interactions is expressed as
n(n-1)/2, an equation that
FreeBSD Kernel
should be familiar to those
who work in networking
Linux Kernel
as it represents a fully connected network. If a system
Python
grows from 100 modules to
200 modules, a 100 percent
T WO NE W TOOLS
My first article on code spelunking covered several tools,
including global, 3 Cscope,1 gprof, 4 ktrace, 7 and truss. 11 I
continue to use these tools on a daily basis, but in the
past five years two new tools have come to my attention:
Doxygen2 and DTrace. 9 Although they may not have
been specifically designed with code spelunking in mind,
both make significant contributions to the field. Here,
I discuss each tool and how it can help us understand
large code bases.
Doxygen. Right at the top of the Doxygen Web page2
is the following statement: “Doxygen is a documentation system for C++, C, Java, Objective-C, Python, IDL
(Corba and Microsoft flavors), Fortran, VHDL, PHP, C#,
and to some extent D.” As the blurb says, Doxygen was
designed with documenting source code in mind—and
it is quite a good system for documenting source code
so that the output is usable as man pages and manuals—but it has a few features that make it applicable to
code spelunking, too.
What Doxygen does is read in all, or part, of a source
tree, looking for documentation tags that it can extract
TABLE
1Growth in Program Size
1.3
2. 2. 8
21
22
5.1
7.0
2. 4. 20-8
2. 6. 25-3
2. 2. 3
2. 5. 2
471
1108
2586
2598
4758
6723
12417
19483
1158
2379
158,332
374,993
1,317,915
1,771,282
2,140,517
3,556,087
5,223,290
8,098,992
356,314
910,573
--
136%
--
34%
--
66%
--
55%
--
155%
28 November/December 2008 ACM QUEUE
rants: feedback@acmqueue.com