from which we can derive clues as to how the software is structured. One clue that is relatively easy to see is that there is another hot spot in the packet output code, namely tcp _ output(), which is called from seven different routines.

The kind of information that Doxygen can show comes at a price. Generating the graphs shown here, which required analyzing 136 files comprising 125,000 lines of code, took 45 minutes on a dual-core 2.5GHz Macbook Pro laptop. Most of the time was taken up by generating the call and caller graphs, which are by far the most useful pieces of information to a code spelunker. 5

DTrace. One of the most talked about system tools in the last few years is DTrace, a project from Sun Microsystems released under the CDDL that has been ported to the FreeBSD and Mac OS/X operating systems. Regardless of whether the designers of DTrace were specifically targeting code spelunking when they wrote their tool, it is clearly applicable.

DTrace has several components: a command line program, a language, and a set of probes that give information about various events that occur throughout the system. The system was designed such that it could be run against an application for which the user had no source code.

DTrace is the next logical step in the line of program tracing programs that came before it, such as ktrace and truss. What DTrace brings to code spelunking is a much richer set of primitives, both in terms of its set of probes and the D language, which makes it easier for code spelunkers to answer the questions they have. A program like ktrace only shows the system calls that the program executes while it’s running, which are all of the application’s interactions with the operating system. On a typical OS these number in the low hundreds, and while they can give clues to what a complex piece of software is doing, they are not the whole story. Ktrace cannot trace the operating system itself, which is something that can now be accomplished using DTrace.

When people discuss DTrace they often point out the large number of probes available, which on Mac OS X is more than 23,000. This is somewhat misleading. Not all of the probes are

table 1. comparing the sizes of the systems as discussed in 2003 and today.

Program

Apache Web server

Version

1. 3

files

471

Lines

158,332

chg Lines

emacs

136%

34%

Freebsd Kernel

66%

linux Kernel

Python

2. 2. 8

21

22

5. 1

7.0

2. 4. 20-8

2. 6. 25-3

2. 2. 3

2. 5. 2

1108

2586

2598

4758

6723

12417

19483

1158

2379

374,993

1,317,915

1,771,282

2,140,517

3,556,087

5,223,290

8,098,992

356,314

910,573

55%

155%

table 2. features in the Doxyfile.

feature EXTRACT _ ALL

SOURCE _ BROWSER

CLASS _ DIAGRAMS

HAVE _ DOT

CALL _ GRAPH

CALLER _ GRAPH

meaning

extract everything you can from the source code.

create a full cross-reference of the source code.

create class diagrams and inheretance graphs.

create useful code spelunking graphs.

Makes a call graph following all function calls.

outputs a graph of the caller dependencies.

table 3. Providers available in mac os x.

Provider

dtrace probes

fbt

io

lockstat

plockstat

proc

profile

syscall

vminfo

Purpose

related to dtrace itself

entry and exit points for functions

i/o probes

Probes related to locking

pthread lock related probes

Process specific information

Profiling and performance data

information on system calls

virtual Memory probes

 

immediately usable, and in reality, having such an embarrassment of riches makes picking the most useful probes for a particular job difficult. A probe is some piece of code in an application, library, or the operating system that can be instrumented to record information on behalf of the user. The probes are broken down into several categories based on what they record. Each

probe is delineated by its Provider, Module, Function, and Name. Providers are named after systems such as io, lockstat, proc, profile, syscall, vminfo, and dtrace itself. There are several distinct providers available in Mac OS X, although naively printing them all will show you that several exist on a per-process basis. The per-process probes show information on core data within

References:

Archives