˲ Unusual data model. Unlike most
database systems that simply represent (or “model”) data as a collection
of records each with a number of attributes, Mesa allows a programmer
to specify a merge function that can be
used to combine two records when a record with a duplicate key arrives. This
makes it possible, for example, to keep
a running total of clicks or revenue for
a particular ad or customer. One advantage of this model is that it allows new
data to be added without reading the
old data first when computing running
sums and so on.
A natural question is how Mesa
compares to existing parallel transactional database systems? Database systems are optimized for high
throughput, but lack several features
that are a requirement of the Mesa solution. First, Mesa fits neatly into the
elegant modular (layered) software
architecture stack Google has built:
It runs on top of Colossus (their distributed file system), and provides a
substrate on which advanced query
processing techniques (like their F1
system) can be built. Layering software this way allows different engineering teams to maintain code,
and allows different layers to service
multiple clients. Many existing data
processing systems are much more
monolithic, and would be difficult
to integrate into the Google software
ecosystem. Second, conventional databases were not built to replicate
data across multiple datacenters.
Traditional systems (typically) use a
single-master approach for fault tolerance, replicating to a (read-only)
standby that can take over on a master failure. Such a design will not work
well if datacenter failures or network
partitions are frequent.
Sam Madden ( madden@csail.mit.edu) is a professor
of computer science at Massachusetts Institute of
Technology, Cambridge, MA.
Copyright held by author.
LEAVE IT TO Google to make business
data processing—among the stodgiest topics in the rather buttoned-up
world of database systems—seem
cool. The application here involves
producing reports over Google’s ads
infrastructure: Google executives
want to see how many ads each Google
property is serving, and how profitable
they are, and Google’s customers want
to see how many users are clicking on
their ads, how much they are paying,
and so on.
At a small scale, solving this problem is straightforward—new ad click
and sales data are appended to a database file as they are sent from the processing system, and computing the
answer to a particular query over the
data involves reading the contents of
(“scanning”) the data file to compute
a running total of the records in the
groups the user is interested in. Making this perform at the scale of Google
Ads, where billions of clicks happen
per day, is the challenge addressed by
the Mesa system described in this following paper.
Fundamentally, the key technique is to employ massive parallelism, both when adding new data and
when looking up specific records.
The techniques used are largely a collection of best practices developed in
the distributed systems and database
communities over the last decade,
with some clever new ideas thrown
in. Some of the highlights from this
work include:
˲ The use of batch updates and
append-only storage. New data is
not added one record at a time, but
is aggregated into batches that are
sent into Mesa. Instead of merging
these batches into the existing storage, these batches are simply written
out as additional files that need to
be scanned when processing a query.
This has the advantage that existing
files are never modified, so queries
can continue to be executed while new
data is added without worrying about
new data being partially read by these
existing queries.
˲ Massive scale parallel query processing. Each query is answered by
one query processing node, but there
can be hundreds or thousands of compute nodes. They can each answer
queries independently because of the
use of append-only storage: query processors never need to worry that the
files they are scanning will change
while they are running, and query processors never wait for each other to
perform operations.
˲ Atomic updates. Some care is
needed to atomically install update
batches, such that they are either
not seen at all or are seen in their entirety. Mesa labels each update with a
unique, monotonically incrementing
version number, which is periodically
communicated to each query processor. Once a query processor learns
of a new version number, it will answer queries up to and including that
batch, and is guaranteed that the files
containing the batch have been completely written and will not change.
This means it can take some time (a
few seconds to a minute) for a query
processor to see a new update, but
this will only result in a slightly stale
answer (missing the most recent update), never an answer that is missing
some arbitrary subset of updates.
A natural question
is how Mesa
compares to existing
parallel transactional
database systems?
Technical Perspective
Mesa Takes Data Warehousing
to New Heights
By Sam Madden
To view the accompanying paper,
visit doi.acm.org/10.1145/2936722
DOI: 10.1145/2936720