these tools, sort of in the way that early
humans learned to cook flesh. Let’s
just say that though the results may
have been edible, they were not winning any Michelin stars.
Using a packet-capture tool is, to
a networking person, somewhat like
using a thermometer is to a parent. It
is likely that if you ever felt sick when
you were a child at least one of your
parents would take your temperature.
If they took you to the doctor, the doctor would also take your temperature.
I once had my temperature taken for a
broken ankle—crazy, yes, but that doctor gave the best prescriptions, so I just
smiled blithely and let him have his
fun. That aside, taking a child’s temperature is the first thing on a parent’s
checklist for the question “Is my child
sick?” What on earth does this have to
do with capturing packets?
By far the best tool for determining
what is wrong with programs that use
a network, or even the network itself, is
the tcpdump tool. Why is that? Surely
in the now 40-plus years since packets were first transmitted across the
original ARPANET we have developed
some better tools. The fact is we have
not. When something in the network
breaks, you want to be able to see the
messages at as many layers as possible.
The other key component in debugging network problems is understanding the timing of what happens, which
a good packet-capture program also
records. Networks are perhaps the
most nondeterministic components
of any complex computing system.
Finding out who did what to whom
and when (another question parents
often ask, usually after a fight among
siblings) is extremely important.
All network protocols, and the programs that use them, have some sort
of ordering that is important to their
functioning. Did a message go missing? Did two or more messages arrive
out of order at the destination? All of
these questions can potentially be
answered by using a packet sniffer to
record network traffic, but only if you
use it!
It’s also important to record the network traffic as soon as you see the problem. Because of their nondeterministic
nature, networks give rise to the worst
types of timing bugs. Perhaps the bug
happens only every so many hours, be-
using a packet-
capture tool is,
to a networking
person, somewhat
like using a
thermometer is
to a parent.
cause of a rollover in a large counter;
you really want to start recording the
network traffic before the bug occurs,
not after, because it may be many hours
until the condition comes up again.
So, here are some very basic recommendations on using a packet sniffer
in debugging a network problem. First,
get permission (yes, it really is KV giving you this advice). People get cranky
if you record their network traffic, such
as instant messages, email, and banking transactions, and then post it to a
mailing list. Just because some person
in IT was dumb enough to give you root
or admin rights on your desktop does
not mean you should just record everything and send it off.
Next, record only as much information as you need to debug the problem.
If you’re new at this you’ll probably
have the program suck up every packet
so you don’t miss anything, but that’s
problematic for two reasons: the first
is the previously mentioned privacy issue; and the second is that if you record
too much data, finding the bug will be
like finding a needle in a haystack—
only you’ve never seen a haystack that big.
Recording an hour of Ethernet traffic
on your LAN can capture a few hundred
million packets. No matter how good a
tool you have, it’s going to do a much
better job at finding a bug if you narrow
down the search.
If you do record a lot of data, don’t
try to share it all as one huge chunk.
See how these points follow each other? Most packet-capture programs
have options to say, “Once the capture file is full, close it and start a new
one.” Limiting files to one megabyte is
a nice start.
Finally, do not record your data on a
network file system. There is no better
way to ruin a whole set of packet-capture files than by having them capture
themselves.
So there you have it: a brief introduction to capturing data so you can
debug a networking problem. Perhaps
now you can get yelled at on a mailing
list for something more egregious than
not taking your network’s temperature
before calling the doctor.
KV
Related articles
on queue.acm.org
Debugging in an Asynchronous World
Michael Donat
http://queue.acm.org/detail.cfm?id=945134
Kode Vicious Bugs Out
Kode Vicious
http://queue.acm.org/detail.cfm?id=1127862
A Conversation with Steve Bourne, Eric
Allman, and Bryan Cantrill
http://queue.acm.org/detail.cfm?id=1413258
Dedication
I would like to dedicate this column
to my first editor, Mrs. B. Neville-Neil,
who passed away after a sudden illness
on December 9th, 2009; she was 65
years old.
My mother took language, both written and spoken, very seriously. The last
thing I wanted to hear upon showing
her an essay I was writing for school
was, “Bring me the red pen.” In those
days I did not have a computer; all my
assignments were written longhand
or on a typewriter and so the red pen
meant a total rewrite. She was a tough
editor, but it was impossible to question
the quality of her work or the passion
that she brought to the writing process.
All of the things Strunk and White have
taught others throughout the years my
mother taught me, on her own, with the
benefit of only a high school education
and a voracious appetite for reading.
It is, in large part, due to my mother’s influence that I am a writer today.
It is also due to her influence that I review articles, books, and code on paper, using a red pen. Her edits and her
unswerving belief that I could always
improve are, already, keenly missed.
—George Vernon Neville-Neil III