code and other source materials in a
version-control system such as Git,
you get the benefit of change history
and collaboration through pull requests (PRs). Modern CI/CD (
continuous integration/continuous delivery)
systems can be used to provide data
that is always fresh and relevant.
Ideally the demo data should be
part of the release cycle, not an afterthought. Feature requests would include the sales narrative and supporting sample data. The feature and the
corresponding demo elements would
be developed concurrently and delivered at the same time.
A casual request for a demo dataset
may seem like a one-time thing that
does not need to be automated, but
the reality is this is a collaborative process requiring multiple iterations and
experimentation. There will undoubtedly be requests for revisions big and
small, the need to match changing
software, and to support new and revised demo stories. All of this makes
automating the process worthwhile.
Modern scripting languages make it
easy to create ad hoc functions that
act like a little language. A repeatable
process helps collaboration, enables
delegation, and saves time now and in
Thanks to George Reilly (Stripe) and
the many anonymous reviewers for
their helpful suggestions.
Automating Software Failure Reporting
Going with the Flow
Peter de Jong
Thomas A. Limoncelli is the SRE manager at Stack
Overflow Inc. in New York City. His books include The
Practice of System and Network Administration, The
Practice of Cloud System Administration, and Time
Management for System Administrators. He blogs at
EverythingSysadmin.com and tweets at @Yes That Tom.
Copyright held by author/owner.
Publication rights licensed to ACM.
Luckily, we did not have to remember. The code told us every decision
we had made. What about the time
one data value was cut in half so it displayed better? Nobody had to remember that. There was even a comment in
the code explaining why we did it. The
time we changed every data point labeled “Boise” to read “Paris?” Nobody
had to remember that either. Heck,
the Makefile encoded exactly how the
raw customer data was extracted and
We were able to make the requested
changes easily. Even the change in database schema was not a big problem
because the generator used the same
library as the product. It just worked.
Yes, we did manually go over the
sales script and make sure that we did
not break any of the stories told during the demo. We probably could have
implemented unit tests to make sure
we did not break or lose them, but in
this case manual testing was OK.
Creating the little language took
longer than the initial “just an afternoon” estimate itself. It may have
looked like a gratuitous delay to outsiders. There was pressure to “just get
it done” and not invest in making a
reusable framework. However, by resisting that pressure we were able to
rapidly turn around change requests,
deliver the final demo on time, and
save time in the future.
Another benefit of this approach
was that it distributed the work. Automation enables delegation. Small
changes could be done by anyone;
thus, the primary engineer was not a
single point of failure for updates and
revisions. Junior engineers were able
to build experience by being involved.
I highly recommend this kind of
technique any time you need to make
a synthetic dataset. This is commonly
needed for sales demos, developer test
data, functional test data, load testing
data, and many other situations.
The tools for making such a system
are much better than they used to be.
The project described here happened
many years ago when the available
tools were Perl, awk, and sed. Modern tools make this much easier. Python and Ruby make it easy to create
little languages. R has many libraries
specifically for importing, cleaning,
and manipulating data. By storing the
difficult to get right
on the first try.
People are very
bad at anonymizing
are not always
that much better.