firewall rules. The front-end did not test
two-way communication, so that errors
in the front-end port configuration were
not reported until the middleware server
was configured. The error message certainly did not help. Perhaps most important was the fact that for most of the
troubleshooting session, George was the
only one who had direct access to the system. All the other participants got their
information filtered through George.
Examining the videotapes in detail, we discovered several instances
in which George misreported or misunderstood what he saw, filtering the
information through his own misunderstanding, and reporting back incorrectly. (One example occurred when
George misread the results of a network
trace, his misunderstanding filtering
out a critical clue.) This prevented Adam
and tech support from helping him effectively. The problem was found only
when Ted looked at the machine state
independently—and then he had to debug George, too. George had many tools
for sharing information about system
state, but none of them gave the whole
picture to the others.
What are the lessons? Collaboration
is critical, especially when misunder-
standings occur (and from what we saw,
incorrect or incomplete understanding
of highly complex systems is a common
source of problems for sysadmins). Yet
collaboration can work only when cor-
rect information is shared, something
that is impeded by misunderstandings
and the limitations of communication
tools. Proper system design can help
avoid misunderstandings in the first
place, and improved tools for sharing
information could help more quickly
rectify misunderstandings when they
occur.
figure 3. accounting of time spent during George’s troubleshooting session.
Tool Usage
Topics Discussed
Instant
Messenger
Face-to-face
Email
Web
Command
Line
Phone
Command
Collaboration
Administrative
Tools
State
Strategy
Personal
Log
Error
Documentation
Configuration