broken down by various attributes, such as the date of the
click, the advertiser, the publisher website that showed the
ad, and the country (key columns). The aggregation function used for both value columns is SUM. All metrics are
consistently represented across the three tables, assuming
the same underlying events have updated data in all these
tables. Figure 1 is a simplified view of Mesa’s table schemas.
In production, Mesa contains over a thousand tables, many
of which have hundreds of columns, using various aggregation functions.
2. 2. Updates and queries
To achieve high update throughput, Mesa applies updates
in batches. The update batches themselves are produced
by an upstream system outside of Mesa, typically at a frequency of every few minutes (smaller and more frequent
batches would imply lower update latency, but higher
resource consumption). Formally, an update to Mesa specifies a version number n (sequentially assigned from 0) and
a set of rows of the form (table name, key, value). Each
update contains at most one aggregated value for every
(table name, key).
A query to Mesa consists of a version number n and a
predicate P on the key space. The response contains one row
for each key matching P that appears in some update with
version between 0 and n. The value for a key in the response
is the aggregate of all values for that key in those updates.
Mesa actually supports more complex query functionality
than this, but all of that can be viewed as pre-processing and
post-processing with respect to this primitive.
As an example, Figure 2 shows two updates correspond-
ing to tables defined in Figure 1 that, when aggregated,
yield tables A, B, and C. To maintain table consistency
(as discussed in Section 2. 1), each update contains con-
sistent rows for the two tables, A and B. Mesa computes
the updates to table C automatically, because they can be
derived directly from the updates to table B. Conceptually,
a single update including both the AdvertiserId and
PublisherId attributes could also be used to update
all three tables, but that could be expensive, especially
in more general cases where tables have many attributes
(e.g., due to a cross product).
Note that table C corresponds to a materialized view
of the following query over table B: SELEC T SUM(Clicks),
SUM(Cost) GROUP BY AdvertiserId, Country. This
query can be represented directly as a Mesa table because
the use of SUM in the query matches the use of SUM as the
aggregation function for the value columns in table B. Mesa
restricts materialized views to use the same aggregation
functions for metric columns as the parent table.
To enforce update atomicity, Mesa uses a multi-versioned approach. Mesa applies updates in order by version number, ensuring atomicity by always incorporating
an update entirely before moving on to the next update.
Users can never see any effects from a partially incorporated update.
The strict ordering of updates has additional applications beyond atomicity. Indeed, the aggregation functions
in the Mesa schema may be non-commutative, such as in the
standard key-value store use case where a (key, value) update
completely overwrites any previous value for the key. More
subtly, the ordering constraint allows Mesa to support use
cases where an incorrect fact is represented by an inverse
action. In particular, Google uses online fraud detection
to determine whether ad clicks are legitimate. Fraudulent
clicks are offset by negative facts. For example, there could
be an update version 2 following the updates in Figure 2
that contains negative clicks and costs, corresponding to
marking previously processed ad clicks as illegitimate. By
enforcing strict ordering of updates, Mesa ensures that a
negative fact can never be incorporated before its positive
counterpart.
2. 3. Versioned data management
Versioned data plays a crucial role in both update and query
processing in Mesa. However, it presents multiple challenges. First, given the aggregatable nature of ads statistics,
storing each version independently is very expensive from
the storage perspective. The aggregated data can typically
be much smaller. Second, going over all the versions and
aggregating them at query time is also very expensive and
increases the query latency. Third, naïve pre-aggregation of
all versions on every update can be prohibitively expensive.
To handle these challenges, Mesa pre-aggregates certain versioned data and stores it using deltas, each of which
consists of a set of rows (with no repeated keys) and a delta
version (or, more simply, a version), represented by [V1, V2],
where V1 and V2 are update version numbers and V1 ≤ V2. We
refer to deltas by their versions when the meaning is clear.
The rows in a delta [V1, V2] correspond to the set of keys that
appeared in updates with version numbers between V1 and V2
(inclusively). The value for each such key is the aggregation
of its values in those updates. Updates are incorporated
into Mesa as singleton deltas (or, more simply, singletons).
Figure 2. Two Mesa updates.
Date PublisherId Country Clicks Cost
2013/12/31 100 US + 10 + 32
2014/01/01 100 US +150 + 80
2014/01/01 200 UK + 40 + 20
(a) Update version 0 for Mesa table A
Date AdvertiserId Country Clicks Cost
2013/12/31 1 US + 10 + 32
2014/01/01 2 UK + 40 + 20
2014/01/01 2 US +150 + 80
(b) Update version 0 for Mesa table B
Date PublisherId Country Clicks Cost
2014/01/01 100 US + 55 + 23
2014/01/01 200 UK + 60 + 30
(c) Update version 1 for Mesa table A
Date AdvertiserId Country Clicks Cost
2013/01/01 1 US + 5 + 3
2014/01/01 2 UK + 60 + 30
2014/01/01 2 US + 50 + 20
(d) Update version 1 for Mesa table B