ing a local copy of their data as a backup.
We saw this happen when we first liberated Blogger: many users started exporting their blogs every week while continuing to host and write in Blogger. This
last scenario is more rooted in emotion
than logic. Most data that users have on
their computers is not backed up at all,
whereas hosted applications almost always store multiple copies of user data
in multiple geographic locations, accounting for hardware failure in addition to natural disasters. Whether users’
concerns are logical or emotional, they
need to feel their data is safe: it’s important that your users trust you.
case study: Google sites
Google Sites is a Web site creator that
allows WYSIWYG editing through the
browser. We use this service inside of
Google for our main project page, as it is
really convenient for creating or aggregating project documentation. We took
on the job of creating the import and export capabilities for Sites in early 2009.
Early in the design, we had to determine what the external format of a
Google Site should be. Considering that
the utility Sites provides is the ability to
create and collaborate on Web sites,
we decided that the format best suited
for true liberation would be XHTML.
HTML, as the language of the Web, also
makes it the most portable format for
a Web site: just drop the XHTML pages
on your own Web server or upload them
to your Web service provider. We wanted to make sure this form of data portability was as easy as possible with as
little loss of data as possible.
Sites uses its internal data format to
encapsulate the data stored in a Web
site, including all revisions to all pages
in the site. The first step to liberating
this data was to create a Google Data
API. A full export of a site is then provided through an open source Java client tool that uses the Google Sites Data
API and transforms the data into a set of
XHTML pages.
The Google Sites Data API, like all
Google Data APIs, is built upon the
AtomPub specification. This allows for
RPC (remote procedure call)-style programmatic access to Google Sites data
using Atom documents as the wire format for the RPCs. Atom works well for
the Google Sites use case, as the data fits
fairly easily into an Atom envelope.
Figure 1 is a sample of one Atom entry that encapsulates a Web page within
Sites. This can be retrieved by using the
Content Feed to Google Sites.
We have highlighted (in red) the actual data that is being exported, which
includes an identifier, a last update time
in ISO 8601 format, title, revision number, and the actual Web-page content.
Mandatory authorship elements and
other optional information included in
the entry have been removed to keep the
example short.
Once the API was in place, the sec-
ond step was to implement the trans-
formation from a set of Atom feeds
into a collection of portable XHTML
Web pages. To protect against losing
any data from the original Atom, we
chose to embed all of the metadata
about each Atom entry right into the
transformed XHTML. Not having this
metadata in the transformed pages
poses a problem during an import—it
becomes unclear which elements of
XHTML correspond to the pieces of the
original Atom entry. Luckily, we did not
have to invent our own metadata em-
bedding technique; we simply used the
hAtom microformat.
figure 1. atom entry encapsulating a Web page within sites.
<entry xmlns:sites=” http://schemas.google.com/sites/2008”>
<id> https://sites.google.com/feeds/content/site/...</id>
<updated>2009-02-09T21:46: 14.991Z</updated>
<category scheme=” http://schemas.google.com/g/2005#kind”
term=” http://schemas.google.com/sites/2008#webpage”
label=”webpage”/>
<title>Maps API Examples</title>
<sites:revision> 2</sites:revision>
<content type=”xhtml”>
<div xmlns=” http://www.w3.org/1999/xhtml”>
... PAGE CONTENT HERE ...
</div>
</content>
</entry>
figure 2. atom entry converted into XhtmL.
<div class=”hentry webpage”
<span class=”entry-title”>Maps API Examples</span>
</h3>
<div>
<div class=”entry-content”>
<div xmlns=” http://www.w3.org/1999/xhtml”>
... PAGE CONTENT HERE ...
</div>
</div>
</div>
<small>
Updated on
<abbr class=”updated” title=”2009-02-09T21:46: 14.991Z”>
Feb 9, 2009
</abbr>
(Version <span class=”sites:revision”> 2</span>)
</small>
</div>