What Data Liberation Looks Like
At Google, our attitude has always been
that users should be able to control the
data they store in any of our products,
and that means they should be able to
get their data out of any product. Period.
There should be no additional monetary cost to do so, and perhaps most importantly, the amount of effort required
to get the data out should be constant,
regardless of the amount of data. Individually downloading a dozen photos is
no big inconvenience, but what if a user
had to download 5,000 photos, one at a
time, to get them out of an application?
That could take weeks of their time.
Even if users have a copy of their
data, it can still be locked in if it is in a
proprietary format. Some word processor documents from 15 years ago cannot be opened with modern software
because they are stored in a proprietary
format. It is important, therefore, not
only to have access to data, but also to
have it in a format that has a publicly
available specification. Furthermore,
the specification must have reasonable license terms: for example, it
should be royalty-free to implement.
If an open format already exists for the
exported data (for example, JPEG or
TIFF for photos), then that should be
an option for bulk download. If there
is no industry standard for the data in
a product (for example, blogs do not
have a standard data format), then
at the very least the format should be
publicly documented—bonus points if
your product provides an open source
reference implementation of a parser
for your format.
The point is that users should be in
control of their data, which means they
need an easy way of accessing it. Providing an API or the ability to download
5,000 photos one at a time does not exactly make it easy for your average user
to move data in or out of a product.
From the user-interface point of view,
users should see data liberation merely
as a set of buttons for import and export
of all data in a product.
Google is addressing this problem
through its Data Liberation Front,
an engineering team whose goal is to
make it easier to move data in and out
of Google products. The data liberation effort focuses specifically on data
that could hinder users from switching to another service or competing
it is preferable
to spend your
engineering effort
on innovation than
it is to build bigger
walls and stronger
doors that prevent
users from leaving.
making it easier for
users to experiment
today greatly
increases their
trust, and they
are more likely
to return to
your product
line tomorrow.
product—that is, data that users create in or import into Google products.
This is all data stored intentionally via
a direct action—such as photos, email,
documents, or ad campaigns—that users would most likely need a copy of if
they wanted to take their business elsewhere. Data indirectly created as a side
effect (for example, log data) falls outside of this mission, as it is not particularly relevant to lock-in.
Another “non-goal” of data liberation is to develop new standards: we allow users to export in existing formats
where we can, as in Google Docs where
users can download word processing
files in OpenOffice or Microsoft Office
formats. For products where there is
no obvious open format that can contain all of the information necessary,
we provide something easily machine
readable such as XML (for example,
for Blogger feeds, including posts and
comments, we use Atom), publicly
document the format, and, where possible, provide a reference implementation of a parser for the format (see the
Google Blog Converters AppEngine
project for an examplea). We try to give
the data to the user in a format that
makes it easy to import into another
product. Since Google Docs deals
with word processing documents and
spreadsheets that predate the rise of
the open Web, we provide a few different formats for export; in most products, however, we assiduously avoid
the rat hole of exporting into every
known format under the sun.
the user’s View
There are several scenarios where users might want to get a copy of their
data from your product: they may have
found another product that better suits
their needs and they want to bring their
data into the new product; you have announced that you are going to stop supporting the product they are using; or,
worse, you may have done something to
lose their trust.
Of course, just because your users
want a copy of their data does not necessarily mean they are abandoning your
product. Many users just feel safer hav-
a http://code.google.com/p/google-blog-converters-appengine/wiki/BloggerExportTemplate; and
http://code.google.com/apis/blogger/docs/2.0/
reference.html#LinkCommentsToPosts.