Google unveils Mesa - Geo-Replicated Near-Realtime Scalable Data Warehouse
Google has published a paper describing their new data-warehouse system used internally by Google called Mesa. Mesa scales across multiple data centers and processes petabytes of data, while being able to respond to queries in sub-second time and maintain ACID properties.
Mesa was designed mainly around Google’s advertising use-case. According to Google, as their ad platform continued to expand, and customers required greater visibility into their advertising campaigns, the demand for more detailed and fine-grained information led to tremendous growth in the data size. Google built Mesa to be able to handle the ever increasing amount of data, while providing consistency and ability to query data in near-realtime. Taken verbatim from Google’s while-paper, the requirements for Mesa are:
Atomic Updates. A single user action may lead to multiple updates at the relational data level, affecting thousands of consistent views, defined over a set of metrics (e.g., clicks and cost) across a set of dimensions (e.g., advertiser and country). It must not be possible to query the system in a state where only some of the updates have been applied.
Consistency and Correctness. For business and legal reasons, this system must return consistent and correct data. We require strong consistency and repeatable query results even if a query involves multiple datacenters.
Availability. The system must not have any single point of failure. There can be no downtime in the event of planned or unplanned maintenance or failures, including outages that affect an entire datacenter or a geographical region.
Near Real-Time Update Throughput. The system must support continuous updates, both new rows and incremental updates to existing rows, with the update volume on the order of millions of rows updated per second. These updates should be available for querying consistently across different views and datacenters within minutes.
Query Performance. The system must support latency- sensitive users serving live customer reports with very low latency requirements and batch extraction users requiring very high throughput. Overall, the system must support point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput of trillions of rows fetched per day.
Scalability. The system must be able to scale with the growth in data size and query volume. For example, it must support trillions of rows and petabytes of data. The update and query performance must hold even as these parameters grow significantly.
Online Data and Metadata Transformation. In order to support new feature launches or change the granularity of existing data, clients often require transformation of the data schema or modifications to existing data values. These changes must not interfere with the normal query and update operations.
According to Google, none of its existing big data technologies are able to meet all of these requirements. BigTable does not provide atomicity and strong consistency. Megastore, Spanner, and F1 do provide strong consistency across geo-replicated data, but they do not support the peak update throughput needed by Mesa clients.
However, Mesa does leverage existing Google components for various parts of its infrastructure. It uses BigTable to store all persistent metadata, and it uses Colossus (Google’s distributed file system) to store the data files. In addition, Mesa leverages MapReduce for processing sequential data.
Mesa’s conceptual data-model is similar to a traditional relational database. All data is stored in tables. A table may also be a materialized view of another table. Each table has a schema that specifies its structure. Because “how many” is such a prevalent question in advertising, an aggregate function such as “SUM” can be specified as part of table's definition. The schema also specifies one or more indexes for a table.
One of the most interesting aspects of Mesa is the way it handles updates. The data stored in Mesa is multi-versioned, allowing it to serve consistent data from previous states while new updates are being processed. An upstream system generates update data in batches, typically once every few minutes. Separate stateless committer instances are responsible for coordinating updates across all the data-centers where Mesa runs. The committer assigns each update batch a new version number and publishes all metadata associated with the update to the versions database, built on top of Paxos consensus algorithm. When a commit criteria is met, meaning that a given update has been incorporated by a sufficient number of Mesa instances worldwide, the committer declares the update’s version number to be the new committed version and stores that value in the versions database. Queries are always issues against the committed version.
Because queries are always issued against the committed version, Mesa does not require any locking between updates and queries. Updates are applied asynchronously in batches by the Mesa instances. These properties allow Mesa to achieve high query and update throughputs, while also guaranteeing data consistency.
Google presents several benchmarks for Mesa’s update and query performance. A sample data source, on average, reads 30 to 60 megabytes of compressed data per second, updating 3 to 6 million distinct rows, and adding 300 thousand new rows. During a single day, Mesa executes roughly 500 million queries, returning 1.7 to 3.2 trillion rows, with the average latency of 10 milliseconds, and 99th percentile latency under 100 milliseconds.
According to Google, the amount of data stored in Mesa has increased five-fold over the last two years. This implies that Mesa has been in Production and used internally by Google for at least this long.
If you’d like to put on your geek hat and learn more about Mesa, you can refer to Google’s Mesa white paper.
Ralph Winzinger Nov 25, 2014
John Krewson, Steve Ropa and Matt Badgley Nov 24, 2014