New features were added on demand,
adding functions as needed.
Because the demo data was being generated this way, it was easy to
regenerate and iterate. For example,
the marketing manager would come
to us and say, “More cowbell!” and
we could add a statement such as
GenerateAndInject(cowbell). The next
day we would be told, “The cowbell
looks too blue. Can it be red instead?”
and we would add code to turn it red.
Rerun the code and we were ready to
show the next iteration.
Anonymization is particularly difficult to get right on the first try. People are very bad at anonymizing data.
Algorithms are not always that much
better. There will be many attempts to
get this right. Once it is deemed “good
enough,” invariably the source data
will change. Having the process automated is a blessing.
Notice the example code includes
comments to record the provenance
of the data and various approvals. We
will be very glad these were recorded if
there are ever questions, complaints,
audits, or legal issues.
This was so much better than hand-editing the data.
This approach really paid off a
few months later when it was time to
update the demo. Version 2.0 of the
software was about to ship, and the
marketing managers wanted three
changes. First, they wanted data that
was more up to date. That was no problem. We added a function that moved
all dates in the data forward by three
months, thus providing a fresher look.
Next, the script now included a story
arc to show off a new feature, and we
needed to supply data to accomplish
that. That was easy, too, as we could
generate appropriate data and integrate it into the database. Lastly, the
new demo needed to use the newest
version of the software, which had a
different database schema. The code
was updated as appropriate.
Oh, and it still needed to do all the
things the old demo did.
If the demo data had been handcrafted, these changes would have
been nearly impossible. We would
have had to reproduce every single
manual change and update. Who
the heck could remember every little
project was a one-time thing. That is,
the data would be generated once and
be perfect on the first try; the engineers could then wash their hands of
it and return to their regularly scheduled work.
This assumption was intended to
be a compliment to the engineers,
but, “Oh, please, this will just take an
afternoon!” is not a tenet of good project management.
I don’t know about you, but I have
never produced something for marketing without being asked for at least
one revision or adjustment. This is a
creative collaboration between two
groups of people. Any such project
requires many iterations and experiments before the results are good or
Marketing believed that by keeping the requirements vague, it would
be easier for the engineers to produce
the perfect dataset on the first try. This
is the opposite of reality. By doing this,
marketing unknowingly requested a
waterfall approach, thinking that a
one-and-done approach would be less
wasteful of the engineers’ time. The
reality is that a big-bang, get-it-all-right-the-first-time approach always
The primary engineer assigned
to the project quickly spotted these
red flags and realized that to make
this project a success, he needed an
approach that would allow for itera-
tion now and provide the ability to
efficiently update the project months
later when version 2.0 of the software
would necessitate an updated demo.
To fix this, the engineer created
a system to generate the demo data
from other data. It would program-matically modify the data as needed.
Thus, future updates could simply
regenerate the data from scratch,
with slightly different operations performed on the data.
The system he created was basically a tiny language for extracting and
modifying data in a repeatable way.
Some of the features included:
˲ The ability to import data from
˲ The ability to insert predefined
(static) data examples.
˲Functions to extract data from
one database, with or without clipping
˲ Synthesizing fake data by calling
˲ Transforming data using function
˲ Various anonymization methods.
The data was generated with a “
program” illustrated in the accompanying figure.
This is not so much a new language
as it is a library of reusable functions.
Pseudo-code for generating the demo data.
Salespeople need to be able to show “problem X”.
We found this data in customer1’s dataset, but we
only need the first 200 rows:
NB: Approval to use customer1’s data is in ticket #12345.
NB: Anonymization technique signed-off in ticket #45678.
The next thing sales will demonstrate is what it
looks like when Problem X is fixed.
Function X generates data that looks that way.
It bases this off dataset2.data, provided by marketing.
There is a requirement that at least one “problem Y”
will be seen in the data. We hand-created that data.