Code Spelunking Redux

growth rate, then the number of potential connections grows from 4,950 to 19,900, a 302 percent growth rate.

One reliable measure of the number of interfaces into a system is the number of system calls provided to user programs by an operating-system kernel. Since the publication of my first article on code spelunking, the Linux kernel has grown from just shy of 200 system calls to 313, an increase of more than 50 percent (see table 1). 6

means more lines of code can now be run in the same amount of time. Available memory gets larger, so we can now keep more state or code in memory. Disks get larger and require less power (in the case of flash), and suddenly we’re able to carry around what were once considered huge amounts of data in our pockets. What was termed the “software crisis” in the 1970s has never really abated, because each time software engineers came up with a new way of working that reduced complexity, the industry moved forward and demanded more.

Complexity increases in many directions, including lines of code, numbers of modules, and numbers of systems and subsystems. More complex systems require more lines of code to implement. As they grow their systems, software teams often integrate more code from outside resources, which leads to complex interactions between systems that may not have been designed with massive integration in mind.

These numbers should not be surprising to any software engineer, but they are a cause for concern. Although it was unlikely that the numbers would shrink, all but one of them has grown by more than 50 percent, and although the number of lines may have grown linearly, the interactions among the new components that these numbers represent have not grown in a linear fashion. If we assume that all modules in a system can interact freely with all other modules, then we

Apache Web Server

have a system in which the potential number of

Emacs

interactions is expressed as n(n-1)/2, an equation that

FreeBSD Kernel

should be familiar to those who work in networking

Linux Kernel

as it represents a fully connected network. If a system

Python

grows from 100 modules to

200 modules, a 100 percent

T WO NE W TOOLS

My first article on code spelunking covered several tools, including global, 3 Cscope,1 gprof, 4 ktrace, 7 and truss. 11 I continue to use these tools on a daily basis, but in the past five years two new tools have come to my attention: Doxygen2 and DTrace. 9 Although they may not have been specifically designed with code spelunking in mind, both make significant contributions to the field. Here, I discuss each tool and how it can help us understand large code bases.

Doxygen. Right at the top of the Doxygen Web page2 is the following statement: “Doxygen is a documentation system for C++, C, Java, Objective-C, Python, IDL (Corba and Microsoft flavors), Fortran, VHDL, PHP, C#, and to some extent D.” As the blurb says, Doxygen was designed with documenting source code in mind—and it is quite a good system for documenting source code so that the output is usable as man pages and manuals—but it has a few features that make it applicable to code spelunking, too.

What Doxygen does is read in all, or part, of a source tree, looking for documentation tags that it can extract

TABLE

1Growth in Program Size

1.3

2. 2. 8

21

22

5.1

7.0

2. 4. 20-8

2. 6. 25-3

2. 2. 3

2. 5. 2

471

1108

2586

2598

4758 6723 12417 19483 1158 2379

158,332

374,993 1,317,915 1,771,282 2,140,517 3,556,087 5,223,290 8,098,992

356,314

910,573

--

136% --

34% --

66% --

55% --

155%

28 November/December 2008 ACM QUEUE

rants: feedback@acmqueue.com

References:

mailto:feedback@acmqueue.com

Archives