˲ ˲ You do not trust the measuring device completely; or
˲ ˲ There is a dependence on something that prevents the measurement
from being made.
If you believe in classic first-order
logic, any assertion must be either true
or false, but in an indeterminate world
following any of these cases, you simply
do not know, because there is insufficient information from which to choose
either true or false. The system has only
two states, but you cannot know which
of them is the case. Moreover, suppose
you measure at some time t; how much
time must elapse before you can no longer be certain of the state?
This situation has been seen before
in, of all places, quantum mechanics.
Like Schrodinger’s cat, you cannot know
which of the two possibilities (dead or
alive) is the case without an active measurement. All you can know is the outcome of each measurement reported
by a probe, after the fact. The lesson of
physics, on the other hand, is that one
can actually make excellent progress
without complete knowledge of a system—by using guiding principles that
do not depend on the uncertain details.
Back to stability?
A system might not be fully knowable,
but it can still be self-consistent. An
obvious example that occurs repeatedly in nature and engineering is that of
equilibrium. Regardless of whether you
know the details underlying a complex
system, you can know its stable states
because they persist. A persistent state
is an appropriate policy for tools such
as computers—if tools are changing
too fast, they become useless. It is better to have a solid tool that is almost
what you would like, rather than the
exact thing you want that falls apart
after a single use (what you want and
what you need are not necessarily the
same thing). Similarly, if system administrators cannot have what they
want, they can at least choose from the
best we can do.
Systems can be stable, either be-
cause they are unchanging or because
many lesser changes balance out over
time (maintenance). There are count-
less examples of very practical tools
that are based on this idea: Lagrange
points (optimization), Nash equilibri-
um (game theory), the Perron-Froben-
ius theorem (graph theory), and the list
goes on. If this sounds like mere aca-
demic nonsense, then consider how
much of this nonsense is in our daily
lives through technologies such as
Google PageRank or the Web of Trust
that rely on this same idea.
It is curious that embracing uncertainty should allow you to understand
something more fully, but the simple
truth is that working around what you
don’t know is both an effective and
low-cost strategy for deciding what you
actually can do.
Major challenges of scale and com-
plexity haunt the industry today. We
now know that scalability is about not
only increasing throughput but also be-
ing able to comprehend the system as it
grows. Without a model, the risk of not
knowing the course you are following
can easily grow out of control. Ultimate-
ly, managing the sum knowledge about
a system is the fundamental challenge:
the test-driven approach is about bet-
ter knowledge management—knowing
what you can and cannot know.
A Plea to Software Vendors from
Sysadmins— 10 Do’s and Don’ts
Thomas A. Limoncelli
Self-healing in Modern Operating Systems
Michael W. Shapiro
A Conversation with Peter Tippett
and Steven hofmeyr
January 10, 2009
1. burgess, m. an approach to understanding policy
based on autonomy and voluntary cooperation.
submitted to IFIP/IEEE 16th International Workshop
on Distributed Systems Operations and Management
2. burgess, m. Computer immunology. In Proceedings of
the 12th System Administration Conference, 1998.
3. burgess, m. Configurable immunity for evolving
human-computer systems. Science of Computer
Programming 51, 3 (2004), 197–213.
4. burgess, m. on the theory of system administration.
Science of Computer Programming 49 (2003), 1–46.
5. Cfengine; http://www.cfengine.org.
6. Couch, a., Daniels, n. the maelstrom: network service
debugging via `ineffective procedures.’ Proceedings of
the 15th Systems Administration Conference (2001), 63.
7. Couch, a., gilfix, m. It’s elementary, dear watson:
applying logic programming to convergent system
management processes. In Proceedings of the 13th
Systems Administration Conference (1999), 123.
8. Dijkstra, e. http://en.wikipedia.org/wiki/guarded_
9. hagemark, b., Zadeck, k. site: a language and system
for configuring many computers as one computer site.
Proceedings of the Workshop on Large Installation
Systems Administration III (1989); http://www2.parc.
10. opscode; http://www.opscode.com/chef.
11. Puppet labs; http://www.puppetlabs.com/.
12. sloman, m. s., moffet, J. Policy hierarchies for
distributed systems management. Journal of Network
and System Management 11, 9 (1993), 404.
Mark Burgess is a professor of network and system
administration, the first with this title, at oslo university
College. his current research interests include the
behavior of computers as dynamic systems and applying
ideas from physics to describe computer behavior. he is
the author of Cfengine and is the founder, chairman, and
Cto of Cfengine, oslo, nor way.
© 2011 aCm 0001-0782/11/0300 $10.00