kode vicious
A koder with attitude, KV answers your questions. Miss Manners he ain’t.
Debugging Devices
Dear KV,
What is the proper way to debug malfunctioning
hardware?
Hard Up Against a Bug
Dear Hard Up,
I suggest taking a very sharp knife and cutting the board traces at random until the thing either works, or smells funny! I gather you’re not asking the same question that led me to use the word “changeineer” in another column (“Permanence and Change,” Communications of the ACM, December 2008). I figure you have an actually malfunctioning piece of hardware and that you’ve already sent three previous versions back to the manufacturer, complete with nasty letters containing veiled references to legal action should they continue to send you broken products.
Along with race conditions, a subject for another time, hardware problems are probably the most difficult things to figure out. While hardware engineers may scoff at software engineers with screwdrivers, if you want to make them truly afraid, get out a logic analyzer or a scope and hook it up to their board. Most software engineers are not, alas, trained in using logic analyzers—or even in basic electrical engineering—so you will have to content yourself with poking at the board through whatever software the board vendor or operating-system vendor has provided you.
Believe it or not—and I am sure if you’re a typical software engineer you won’t want to hear this—the best place to start is with the hardware vendor’s documentation. Of course, many hardware vendors take as dim a view of
HAVE A QUESTION FOR KODE VICIOUS?
E-mail him at kv@acmqueue.com. If your question appears
in his column, we’ll send you a rare piece of authentic Queue
memorabilia. We edit e-mails for style, length, and clarity.
6 November/December 2008 ACM QUEUE
documentation as software vendors do. The quality of the documentation I have seen has run the gamut from unus-ably terrible all the way up to “bang my head on the desk and cry.” Rarely have I seen hardware documentation that was both correct and had a structure that made sense to anyone but the people who originally put it together. Happily, it is rare these days to be able to completely destroy a piece of hardware by putting the wrong value into the wrong memory location; the days of exploding computers à la the original “Star Trek” are still a couple of centuries in the future.
All devices have problems, but the ones that get fixed are the ones that have good engineering resources behind them.
That being said, it is definitely possible to cause damage to hardware via software, or, more commonly, to mask whatever problem you were having by tripping some seemingly unrelated bit of configuration magic in the device. Not that KV is against magic; it’s just that he tends not to trust it... at all.
If you’re lucky, you have the documentation for the system, or can get the lawyer where you work to send a nondisclosure agreement and a letter to the vendor to get whatever it’s willing to give you.
Read the documentation first. Really, trust me on this. It may be completely useless in the end but it may also save you a lot of time if you find just the right bit of information in the docs. I tend to read over all the available registers and configuration options, of which there
rants: feedback@acmqueue.com
References:
Archives