challenge is the move by some developers away from a tool-based approach to an all-in-one approach.

A tool-based approach can best be understood by looking at the programs available on any Unix-like system. The use of several programs, mixed and matched, to complete a task has several obvious benefits that are well documented by others. When working with large code bases, the downfalls of an all-in-one approach, such as an IDE, become a bit clearer. A system such as the FreeBSD kernel is already several hundred megabytes of text. Processing that code base with tools such as Cscope and global in order to make it more easily navigable generates a further 175 MB of data. Although 175 MB of data may be small in comparison with the memory of the average desktop or laptop, which routinely come with 2 GB to 4 GB of RAM, storing all that state in memory while processing leads to lower performance in whatever tool is being used. The pipeline processing of data, which keeps in-memory data small, improves the responsiveness of the tools involved. Loading the FreeBSD kernel into Eclipse took quite a long time and then took up several hundred megabytes of RAM. I have seen similar results with other IDEs on other large code bases.

An even larger challenge looms for those who work on not only large but also heterogeneous code bases. Most Web sites today are a mélange of PHP or Python with C or C++ extensions, using MySQL or PostgreSQL as a database back end, all on top of an operating system written in C. It is often the case that tracking down particularly difficult problems requires crossing language barriers several times—from PHP into C++ and then into SQL, then perhaps back to C or C++. Thus far, I have seen no evidence of tools that understand how to analyze these cross-language interactions.

The area that deserves the most attention is visualization. Of all the tools reviewed, only Doxygen generates interesting and usable visual output. The other tools have a very narrow, code-based focus in which the user is usually looking at only a small part of the system being investigated.

Working in this way is a bit like trying to understand the United States by staring at a street sign in New York City. The ability to look at a high-level representation of the underlying system without the fine details would be perhaps the best tool for the code spelunker. Being able to think of software as a map that can be navigated in different ways—for example, by class relations and call graphs—would make code spelunkers far more productive.

more queue: www.acmqueue.com

One last area that has not been covered is the network. Network spelunking, the ability to understand an application based on its network traffic, is still in a very nascent state, with tools such as Wireshark being the state of the art. Many applications are already running online, and being able to understand and work with them at the network level is very important. Q

REFERENCES

1. Cscope man page; http://cscope.sourceforge.net/ cscope_man_page.html.

2. Doxygen Web site; http://www.stack.nl/~dimitri/ doxygen/.

3. GNU global source-code tag system (Apr. 21, 2008). Tama Communications Corporation; http://tamacom. com.

4. gprof; http://www.gnu.org/manual/gprof- 2. 9.1/gprof . html.

5. Graphviz Web site; http://www.graphviz.org/.

6. Jones, T., Tauferner, A., Inglett T. 2007. HPC system call usage trends. Linux Clusters Institute; http://www. linuxclustersinstitute.org/conferences/archive/2007/ PDF/jones_21421.pdf.

7. ktrace: standard tool on open-source operating systems.

8. Neville-Neil, G.V. 2003. Code spelunking: Exploring cavernous code bases. ACM Queue 1( 6): 42-48. http:// doi.acm.org/10.1145/945131.945136.

9. Sun Microsystems. 2005. How to Use DTrace; http:// www.sun.com/software/solaris/howtoguides/dtrace-howto.jsp.

10. Sun Microsystems. 2005. Solaris Dynamic Tracing Guide; http://docs.sun.com/app/docs/doc/817-6223.

11. Truss is available on Solaris.

12. Wright, G.R., Stevens, W.R. 1995. TCP/IP Illustrated, Vol. 2: The Implementation. Boston, MA: Addison-Wesley Professional.

LOVE IT, HATE IT? LET US KNOW feedback@acmqueue.com or www.acmqueue.com/forums

GEORGE V. NEVILLE-NEIL ( gnn@acm.org) is a columnist for Communications of the ACM and ACM Queue, as well as a member of the Queue Editorial Board. He works on networking and operating system code and teaches courses on various subjects related to programming. © 2008 ACM 1542-7730 /08/1100 $5.00

This article appeared in print in the October 2008 issue of

Communications of the ACM.

ACM QUEUE November/December 2008 33

References:

http://www.acmqueue.com

http://www.graphviz.org/

http://docs.sun.com/app/docs/doc/817-6223

mailto:feedback@acmqueue.com

http://www.acmqueue.com/forums

mailto:gnn@acm.org

http://cscope.sourceforge.net/cscope_man_page.html

http://cscope.sourceforge.net/cscope_man_page.html

http://www.stack.nl/~dimitri/doxygen/

http://www.stack.nl/~dimitri/doxygen/

http://tamacom.com

http://tamacom.com

http://www.gnu.org/manual/gprof-2.9.1/gprof.html

http://www.gnu.org/manual/gprof-2.9.1/gprof.html

http://www.linuxclustersinstitute.org/conferences/archive/2007/PDF/jones_21421.pdf

http://www.linuxclustersinstitute.org/conferences/archive/2007/PDF/jones_21421.pdf

http://www.linuxclustersinstitute.org/conferences/archive/2007/PDF/jones_21421.pdf

http://doi.acm.org/10.1145/945131.945136

http://doi.acm.org/10.1145/945131.945136

http://www.sun.com/software/solaris/howtoguides/dtrace-howto.jsp

http://www.sun.com/software/solaris/howtoguides/dtrace-howto.jsp

http://www.sun.com/software/solaris/howtoguides/dtrace-howto.jsp

Archives