challenge is the move by some developers away from a
tool-based approach to an all-in-one approach.
A tool-based approach can best be understood by
looking at the programs available on any Unix-like system. The use of several programs, mixed and matched, to
complete a task has several obvious benefits that are well
documented by others. When working with large code
bases, the downfalls of an all-in-one approach, such as an
IDE, become a bit clearer. A system such as the FreeBSD
kernel is already several hundred megabytes of text.
Processing that code base with tools such as Cscope and
global in order to make it more easily navigable generates
a further 175 MB of data. Although 175 MB of data may
be small in comparison with the memory of the average desktop or laptop, which routinely come with 2 GB
to 4 GB of RAM, storing all that state in memory while
processing leads to lower performance in whatever tool is
being used. The pipeline processing of data, which keeps
in-memory data small, improves the responsiveness
of the tools involved. Loading the FreeBSD kernel into
Eclipse took quite a long time and then took up several
hundred megabytes of RAM. I have seen similar results
with other IDEs on other large code bases.
An even larger challenge looms for those who work on
not only large but also heterogeneous code bases. Most
Web sites today are a mélange of PHP or Python with C or
C++ extensions, using MySQL or PostgreSQL as a database back end, all on top of an operating system written
in C. It is often the case that tracking down particularly
difficult problems requires crossing language barriers
several times—from PHP into C++ and then into SQL,
then perhaps back to C or C++. Thus far, I have seen no
evidence of tools that understand how to analyze these
cross-language interactions.
The area that deserves the most attention is visualization. Of all the tools reviewed, only Doxygen generates
interesting and usable visual output. The other tools
have a very narrow, code-based focus in which the user is
usually looking at only a small part of the system being
investigated.
Working in this way is a bit like trying to understand
the United States by staring at a street sign in New York
City. The ability to look at a high-level representation of
the underlying system without the fine details would be
perhaps the best tool for the code spelunker. Being able
to think of software as a map that can be navigated in
different ways—for example, by class relations and call
graphs—would make code spelunkers far more productive.
more queue: www.acmqueue.com
One last area that has not been covered is the network. Network spelunking, the ability to understand an
application based on its network traffic, is still in a very
nascent state, with tools such as Wireshark being the state
of the art. Many applications are already running online,
and being able to understand and work with them at the
network level is very important. Q
REFERENCES
1. Cscope man page; http://cscope.sourceforge.net/
cscope_man_page.html.
2. Doxygen Web site; http://www.stack.nl/~dimitri/
doxygen/.
3. GNU global source-code tag system (Apr. 21, 2008).
Tama Communications Corporation; http://tamacom.
com.
4. gprof; http://www.gnu.org/manual/gprof- 2. 9.1/gprof .
html.
5. Graphviz Web site; http://www.graphviz.org/.
6. Jones, T., Tauferner, A., Inglett T. 2007. HPC system
call usage trends. Linux Clusters Institute; http://www.
linuxclustersinstitute.org/conferences/archive/2007/
PDF/jones_21421.pdf.
7. ktrace: standard tool on open-source operating systems.
8. Neville-Neil, G.V. 2003. Code spelunking: Exploring
cavernous code bases. ACM Queue 1( 6): 42-48. http://
doi.acm.org/10.1145/945131.945136.
9. Sun Microsystems. 2005. How to Use DTrace; http://
www.sun.com/software/solaris/howtoguides/dtrace-howto.jsp.
10. Sun Microsystems. 2005. Solaris Dynamic Tracing
Guide; http://docs.sun.com/app/docs/doc/817-6223.
11. Truss is available on Solaris.
12. Wright, G.R., Stevens, W.R. 1995. TCP/IP Illustrated,
Vol. 2: The Implementation. Boston, MA: Addison-Wesley Professional.
LOVE IT, HATE IT? LET US KNOW
feedback@acmqueue.com or www.acmqueue.com/forums
GEORGE V. NEVILLE-NEIL ( gnn@acm.org) is a columnist
for Communications of the ACM and ACM Queue, as well as a
member of the Queue Editorial Board. He works on networking and operating system code and teaches courses on various subjects related to programming.
© 2008 ACM 1542-7730 /08/1100 $5.00
This article appeared in print in the October 2008 issue of
Communications of the ACM.
ACM QUEUE November/December 2008 33