or new programs that are executed
when the system is shut down or rebooted. In this way, if people are insta-booting your machines before you can
get to them, you can make it so their reboot command does your bidding. You
can even go so far as to rig your operating system to produce a kernel core
dump on each reboot. This gives you a
snapshot of the system as it was when
it was broken, which you can go back
to later and pick through. I warn you,
though, that picking through a kernel
core dump is about as much fun as
picking fleas off a dog.
The only downside to collecting all
this data is analyzing it. Since it is no
longer really necessary to delete data,
you may wind up spending a good
deal of time organizing it into trees of
trees of files. I offer a couple of quick
suggestions. Do not make the tree
scheme too difficult to traverse, either for a person or a program. It can
take a very long time to access a ton
of files in deep trees, due to the cost
of traversing the directory trees. Keep
things simple for both yourself and
your analysis programs. Before you
start, have a plan for what you want to
store, where you want to store it, and
how you plan to access it. Most people
put this kind of thought into their applications, but not enough into how
and where they store logs or other
runtime information generated by
their systems. You should put at least
half as much time into the latter as
you do into the former.
KV
Related articles
on queue.acm.org
Kode Vicious Gets Dirty
George Neville-Neil
http://queue.acm.org/detail.cfm?id=1071723
high Performance Web Sites
Steve Souders
http://queue.acm.org/detail.cfm?id=1466450
Erlang for Concurrent Programming
Jim Larson
http://queue.acm.org/detail.cfm?id=1454463
George V. neville-neil ( kv@acm.org) is the proprietor of
neville-neil Consulting and a member of the ACM Queue
editorial board. he works on networking and operating
systems code for fun and profit, teaches courses on
various programming-related subjects, and encourages
your comments, quips, and code snips pertaining to his
Communications column.
Copyright held by owner/author(s).
Everyone has a naming horror
story. My first was at a university
where the hosts were named after rivers. That would have been fine if you
could remember how to spell “Seine,”
but once you run out of nice short
names, you get to “Mississippi” and
“Dnjeper.” That is what I want to do
when I remotely log in to a host, I want
to think in my head, “M-I crooked
letter crooked letter I crooked letter crooked letter I hump back hump
back I,” which is how I and many other
American schoolchildren learned to
spell Mississippi. I could go on and
on about this, but then I would sound
like those systems administrators I
mentioned who were lamenting past
host names. Here, therefore, is a short
guide to picking host names.
A name you are going to use on a
daily basis must be easy to type. That
means no silent letters, such as in
Dnjeper, and nothing that is too long, like
thisisthehostthatjackbuilt.
It is a good idea to choose names that
everyone you work with can pronounce.
With globalization, finding pronounceable names has become more difficult,
since some people cannot pick up L vs.
R, or understand whether you just used
a double o or a single o, and diphthongs
will kill you (no, diphthongs are not a
new Brazilian bathing suit). The main
point here is to avoid picking a name
with a lot of sounds that are difficult
to translate into typing. Typing is still
faster than using a voice-recognition
system; so remember, these names will
have to be typed.
If you are going to use services as
names, make sure you can replace the
systems behind the names without hiccups. It should be obvious that everyone is going to be annoyed if they have
to use mail2.yourdomain.com when
mail.yourdomain.com goes down.
(This point is not really about naming,
because any systems administrator
worth his or her paycheck can build a
system like this; but I have seen it done
the wrong way, so I wanted to state it
for the record.)
Avoid at all costs having two dif-
ferent, unrelated names for the same
thing. In fact, this is true in code and
host names. If you have two similar ser-
vices and you want two different names,
make it completely obvious how to map
one name to the other and back. It is
maddening to have the kind of back
and forth where one person asks,
“Hey, can I reboot fibble?”
“Yes.”
And then someone asks,
“Who rebooted mail1?”
“But I didn’t know it was mail1; I
thought it was fibble.”
Finally, try to avoid being cute. I
know that giving this piece of advice is
basically tilting at windmills, but I have
to say that people who name their mail
servers male and female make my nor-
mally icy blood boil.
KV
Dear KV,
One of my company’s frontline engineers—in the group that looks at the
live traffic hitting our switches and
servers—keeps reporting problems,
and then, before anyone can look at
the server that is having issues, reboots
the system to clear the problem. How
do you explain to someone there is information that needs to be collected
when the system is misbehaving that is
absolutely vital to finding and solving
the problem?
Booted
Dear Booted,
I would start by standing with my foot
on this person’s chest and yelling,
“There is information that needs to be
collected when the system is misbehaving that is absolutely vital to finding
and solving the problem.” But I take it
you have tried that already, though perhaps without enough screaming.
True, systems tend to build up state
during execution that is not written to
some permanent storage often enough.
The problem you need to solve is not
preventing the person from insta-boot-ing a misbehaving machine, as much
as it is to make sure there is a good,
searchable record of what the system is
doing when it is running. Most system-monitoring tools on modern servers
generate plain text output. It is a simple matter to write scripts that execute
periodically to write the output of these
tools—such as procstat, netstat, iostat,
and the like—into files that will be preserved across reboots.
For more pernicious problems, you
can write your own tools, either scripts