from which we can derive clues as to
how the software is structured. One clue
that is relatively easy to see is that there
is another hot spot in the packet output
code, namely tcp _ output(), which
is called from seven different routines.
The kind of information that Doxygen can show comes at a price. Generating the graphs shown here, which
required analyzing 136 files comprising 125,000 lines of code, took 45 minutes on a dual-core 2.5GHz Macbook
Pro laptop. Most of the time was taken
up by generating the call and caller
graphs, which are by far the most useful pieces of information to a code spelunker. 5
DTrace. One of the most talked
about system tools in the last few years
is DTrace, a project from Sun Microsystems released under the CDDL that has
been ported to the FreeBSD and Mac
OS/X operating systems. Regardless of
whether the designers of DTrace were
specifically targeting code spelunking
when they wrote their tool, it is clearly
applicable.
DTrace has several components: a
command line program, a language,
and a set of probes that give information about various events that occur
throughout the system. The system
was designed such that it could be run
against an application for which the
user had no source code.
DTrace is the next logical step in
the line of program tracing programs
that came before it, such as ktrace and
truss. What DTrace brings to code spelunking is a much richer set of primitives, both in terms of its set of probes
and the D language, which makes it
easier for code spelunkers to answer
the questions they have. A program like
ktrace only shows the system calls that
the program executes while it’s running, which are all of the application’s
interactions with the operating system.
On a typical OS these number in the low
hundreds, and while they can give clues
to what a complex piece of software is
doing, they are not the whole story.
Ktrace cannot trace the operating system itself, which is something that can
now be accomplished using DTrace.
When people discuss DTrace they
often point out the large number of
probes available, which on Mac OS X
is more than 23,000. This is somewhat
misleading. Not all of the probes are
table 1. comparing the sizes of the systems as discussed in 2003 and today.
Program
Apache Web
server
Version
1. 3
files
471
Lines
158,332
chg Lines
emacs
136%
34%
Freebsd Kernel
66%
linux Kernel
Python
2. 2. 8
21
22
5. 1
7.0
2. 4. 20-8
2. 6. 25-3
2. 2. 3
2. 5. 2
1108
2586
2598
4758
6723
12417
19483
1158
2379
374,993
1,317,915
1,771,282
2,140,517
3,556,087
5,223,290
8,098,992
356,314
910,573
55%
155%
table 2. features in the Doxyfile.
feature
EXTRACT _ ALL
SOURCE _ BROWSER
CLASS _ DIAGRAMS
HAVE _ DOT
CALL _ GRAPH
CALLER _ GRAPH
meaning
extract everything you can from the source code.
create a full cross-reference of the source code.
create class diagrams and inheretance graphs.
create useful code spelunking graphs.
Makes a call graph following all function calls.
outputs a graph of the caller dependencies.
table 3. Providers available in mac os x.
Provider
dtrace probes
fbt
io
lockstat
plockstat
proc
profile
syscall
vminfo
Purpose
related to dtrace itself
entry and exit points for functions
i/o probes
Probes related to locking
pthread lock related probes
Process specific information
Profiling and performance data
information on system calls
virtual Memory probes
immediately usable, and in reality, having such an embarrassment of riches
makes picking the most useful probes
for a particular job difficult. A probe is
some piece of code in an application, library, or the operating system that can
be instrumented to record information on behalf of the user. The probes
are broken down into several categories based on what they record. Each
probe is delineated by its Provider,
Module, Function, and Name. Providers are named after systems such as io,
lockstat, proc, profile, syscall, vminfo,
and dtrace itself. There are several distinct providers available in Mac OS X,
although naively printing them all will
show you that several exist on a per-process basis. The per-process probes
show information on core data within