Iteration 1 had the goal of monitoring a few servers to get feedback
from various stakeholders. The team
installed an open-source monitoring
system on a virtual machine. This was
in sharp contrast to their original plan
of a system that would be highly scalable. Virtual machines have less I/O
and network horsepower than physical machines. Hardware could not be
ordered and delivered in a one-week
time frame, however. So the first iteration used virtual machines. It was what
could be done by Friday.
At the end of this iteration, the team
didn’t have their dream monitoring
system, but they had more monitoring
capability than ever before.
In this iteration they learned that
SNMP (Simple Network Management
Protocol) was disabled on most of
the organization’s networking equipment. They would have to coordinate
with the network team if they were to
collect network utilization and other
statistics. It was better to learn this
now than to have their major deployment scuttled by making this discovery during the final big deployment.
To work around this, the team decided
to focus on monitoring things they did
control, such as servers and services.
This gave the network team time to
create and implement a project to enable SNMP in a secure and tested way.
Iterations 2 and 3 proceeded well,
adding more machines and testing other configuration options and features.
During iteration 4, however, the
team noticed the other system administrators and managers had not
been using the system much. This
was worrisome. They paused to talk
one-on-one with people to get some
What the team learned was that
without the ability to have dashboards
that displayed historical data, the system wasn’t very useful to its users.
In all the past debates this issue had
never been raised. Most confessed
they had not thought it would be important until they saw the system running; others had not raised the issue
because they simply assumed all monitoring systems had dashboards.
It was time to pivot.
The software package that had been
the team’s second choice had very sophisticated dashboard capabilities.
however, it would be easy to throw
away the mistakes. This would enable
the team to pivot, meaning they could
change direction based on recent results. It is better to pivot early in the development process than to realize well
into it that you have built something
Google calls this “launch early and
often.” Launch as early as possible
even if that means leaving out most of
the features and launching to only a
few select users. What you learn from
the early launches informs the decisions later on and produces a better
service in the end.
Launching early and often also gives
you the opportunity to build operational infrastructure early. Some companies build a service for a year and then
launch it, informing the operations
team only a week prior. IT then has little time to develop operational practices such as backups, on-call playbooks,
and so on. Therefore, those things are
done badly. With the launch-early-and-often strategy, you gain operational
experience early and you have enough
time to do it right.
This is also known as the MVP strategy. As defined by Eric Ries in 2009, “The
minimum viable product is that version
of a new product which allows a team to
collect the maximum amount of validated learning about customers with the
least effort” (“Minimum Viable Product:
A Guide;” http://www.startuplessons-
product-guide.html). In other words,
rather than focusing on new functionality in each release, focus on testing an
assumption in each release.
The team building the monitoring
system adopted the launch-early-and-often strategy. They decided that each
iteration, or small batch, would be
one week long. At the end of the week
they would release what was running
in their beta environment to their production environment and ask for feedback from stakeholders.
For this to work they had to pick very
small chunks of work. Taking a cue
from Jason Punyon and Kevin Montrose (“Providence: Failure Is Always
an Option;” http://jasonpunyon.com/
is-always-an-option/), they called this
“What can get done by Friday?”-driven
With the launch-
and you have
enough time to
do it right.