˲ Hardware burn-in testing. You can
run extreme tests on the various hardware components in a node in order to
confirm that none of them would experience faults at the onset of load. This
may not be necessary or feasible in a
cloud-compute instance.
˲ Unit testing of components. Each
service can be easily tested in isolation, and configuration can be check-summed to assure expectations.
˲ Functional testing of integrations.
Each execution path (usually based on
an application feature) can be explored
with some form of automated procedure to assure expected results.
Photogra Ph by Meg/FlICkr
Traditionally, these sensible mea-
sures to gain confidence are made
before systems or applications reach
production. Once in production, the
traditional approach is to rely on moni-
toring and logging to confirm that
everything is working correctly. If it
is behaving as expected, then you do
not have a problem. If it is not, and it
requires human intervention (trouble-
shooting, triage, resolution, and so
on), then you need to react to the inci-
dent and get things working again as
fast as possible.