A koder with attitude, KV answers your questions. Miss Manners he ain’t.
What is the proper way to debug malfunctioning
Hard Up Against a Bug
Dear Hard Up,
I suggest taking a very sharp knife and cutting the board
traces at random until the thing either works, or smells
funny! I gather you’re not asking the same question that
led me to use the word “changeineer” in another column
(“Permanence and Change,” Communications of the ACM,
December 2008). I figure you have an actually malfunctioning piece of hardware and that you’ve already
sent three previous versions back to the manufacturer,
complete with nasty letters containing veiled references
to legal action should they continue to send you broken
Along with race conditions, a subject for another
time, hardware problems are probably the most difficult
things to figure out. While hardware engineers may scoff
at software engineers with screwdrivers, if you want to
make them truly afraid, get out a logic analyzer or a scope
and hook it up to their board. Most software engineers
are not, alas, trained in using logic analyzers—or even in
basic electrical engineering—so you will have to content
yourself with poking at the board through whatever software the board vendor or operating-system vendor has
Believe it or not—and I am sure if you’re a typical software engineer you won’t want to hear this—the best place
to start is with the hardware vendor’s documentation.
Of course, many hardware vendors take as dim a view of
HAVE A QUESTION FOR KODE VICIOUS?
E-mail him at email@example.com. If your question appears
in his column, we’ll send you a rare piece of authentic Queue
memorabilia. We edit e-mails for style, length, and clarity.
6 November/December 2008 ACM QUEUE
documentation as software vendors do. The quality of the
documentation I have seen has run the gamut from unus-ably terrible all the way up to “bang my head on the desk
and cry.” Rarely have I seen hardware documentation
that was both correct and had a structure that made sense
to anyone but the people who originally put it together.
Happily, it is rare these days to be able to completely
destroy a piece of hardware by putting the wrong value
into the wrong memory location; the days of exploding
computers à la the original “Star Trek” are still a couple of
centuries in the future.
All devices have problems,
but the ones that get fixed
are the ones that have
resources behind them.
That being said, it is definitely possible to cause damage to hardware via software, or, more commonly, to
mask whatever problem you were having by tripping
some seemingly unrelated bit of configuration magic in
the device. Not that KV is against magic; it’s just that he
tends not to trust it... at all.
If you’re lucky, you have the documentation for the
system, or can get the lawyer where you work to send a
nondisclosure agreement and a letter to the vendor to get
whatever it’s willing to give you.
Read the documentation first. Really, trust me on
this. It may be completely useless in the end but it may
also save you a lot of time if you find just the right bit of
information in the docs. I tend to read over all the available registers and configuration options, of which there