Mesa: A Geo-Replicated Online
Data Warehouse for Google’s
Advertising System
By Ashish Gupta, Fan Yang, Jason Govig, Adam Kirsch, Kelvin Chan, Kevin Lai, Shuo Wu, Sandeep Dhoot, Abhilash Rajesh
Kumar, Ankur Agiwal, Sanjay Bhansali, Mingsheng Hong, Jamie Cameron, Masood Siddiqi, David Jones, Jeff Shute, Andrey
Gubarev, Shivakumar Venkataraman, and Divyakant Agrawal
DOI: 10.1145/2936722
Abstract
Mesa is a highly scalable analytic data warehousing system
that stores critical measurement data related to Google’s
Internet advertising business. Mesa is designed to satisfy a
complex and challenging set of user and systems requirements, including near real-time data ingestion and retrieval,
as well as high availability, reliability, fault tolerance, and
scalability for large data and query volumes. Specifically,
Mesa handles petabytes of data, processes millions of row
updates per second, and serves billions of queries that fetch
trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable
query answers at low latency, even when an entire datacenter fails. This paper presents the Mesa system and reports
the performance and scale that it achieves.
1. INTRODUCTION
Google runs an extensive advertising platform across multiple channels that serves billions of advertisements (or ads)
every day to users all over the globe. Detailed information
associated with each served ad, such as the targeting criteria, number of impressions and clicks, etc., are recorded
and processed in real time. This data is used extensively at
Google for different use cases, including reporting, internal auditing, analysis, billing, and forecasting. Advertisers
gain fine-grained insights into their advertising campaign
performance by interacting with a sophisticated front-end
service that issues online and on-demand queries to the
underlying data store. Google’s internal ad serving platforms use this data in real time, determining budgeting
and ad performance to enhance ad serving relevancy. As
the Google ad platform continues to expand and as internal
and external customers request greater visibility into their
advertising campaigns, the demand for more detailed and
fine-grained information leads to tremendous growth in
the data size. The scale and business critical nature of this
data result in unique technical and operational challenges
for processing, storing, and querying. The requirements for
such a data store are:
Atomic Updates. A single user action may lead to multiple
updates at the relational data level, affecting thousands of
consistent views, defined over a set of metrics (e.g., clicks
and cost) across a set of dimensions (e.g., advertiser and
country). It must not be possible to query the system in a
state where only some of the updates have been applied.
Consistency and Correctness. For business and legal reasons, this system must return consistent and correct data.
We require strong consistency and repeatable query results
even if a query involves multiple datacenters.
Availability. The system must not have any single point of
failure. There can be no downtime in the event of planned or
unplanned maintenance or failures, including outages that
affect an entire datacenter or a geographical region.
Near Real-Time Update Throughput. The system must support continuous updates, both new rows and incremental
updates to existing rows, with the update volume on the
order of millions of rows updated per second. These updates
should be available for querying consistently across different views and datacenters within minutes.
Query Performance. The system must support latency-sensitive
users serving live customer reports with very low latency
requirements and batch extraction users requiring very
high throughput. Overall, the system must support point
queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput of trillions of rows
fetched per day.
Scalability. The system must be able to scale with the growth
in data size and query volume. For example, it must support
trillions of rows and petabytes of data. The update and query
performance must hold even as these parameters grow
significantly.
Online Data and Metadata Transformation. In order to support new feature launches or change the granularity of existing data, clients often require transformations of the data
schema or modifications to existing data values. These
changes must not interfere with the normal query and
update operations.
Mesa is Google’s solution to these technical and opera-
tional challenges for business critical data. Mesa is a dis-
tributed, replicated, and highly available data processing,
storage, and query system for structured data. Mesa ingests
data generated by upstream services, aggregates and per-
sists the data internally, and serves the data via user queries.
Even though this paper mostly discusses Mesa in the context
The original version of this paper, entitled “Mesa: Geo-
Replicated, Near Real-Time, Scalable Warehousing,” was
published in the Proceedings of the VLDB Endowment 7, 12
(Aug. 2014), 1259–1270.