Google Publishes Paper On Spanner Ushering a Return to Distributed Transactional Semantics

Last month, Google released details of Spanner, a highly scalable and globally replicated semi-relational database in a publication to the Operating System Design and Implementation(OSDI '12) conference. Last week Google added a video recording of the OSDI 2012 presentation by Wilson Hsieh, co-author on the paper, which focuses on some key concepts in the paper and Alex Popescu from InfoQ posted a more detailed talk from Alex Lloyd at Berlin Buzzwords. The research proves that ACID semantics need not be sacrificed for higher scalability, thus negating the belief that NoSQL is the panacea for high scalability persistence. This is emphasized through this quote from the paper:

We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.

The Spanner project resulted from the need for a highly relational and transactional yet globally scalable persistence solution for the Google AdWords system. MegaStore partially responded to these concerns since its consistency guarantees could not be achieved without predictably higher latency for continent wide transactions. Enter Spanner, the latency problems with distributed transactions is handled through what Google calls TrueTime API, which is basically a solution for the clock uncertainty problem.

Clock uncertainty (denoted by epsilon in the paper) is introduced by clock drifts and network latency in determining the clock time from various time master references in a wide area network. Time master references are a mix of GPS time masters and atomic clocks and their error rates are reduced through redundancy. By determining the factors that contribute to clock uncertainty and upper bounding it to a commit wait interval(twice epsilon), external consistency guarantees along with other benefits such as lock-free read transactions, non-blocking reads and atomic schema changes were achieved. Hence the commit wait interval is directly tied to clock uncertainty, higher the uncertainty, longer is the commit wait interval which would imply slowing down Spanner. However to reduce this dulling effect of longer commit wait intervals(typically 10ms but a long tail distribution), Spanner executes the prepare phase for Paxos(consensus protocol) or a two-phase commit during the wait period.

Spanner's data model is a semi-relational hierarchically structured model similar to Megastore. Timothy O'Brien at O'Reilly summarized the Spanner deployment in his blog post:

A Spanner deployment consists of a few management servers to manage multiple “zones” across data centers. A “Zone master” and a series of “location proxies” manage hundreds or thousands of “spanservers” that perform the bulk of the work in the Spanner database. Spanservers house units of data called “directories,” each of these units implements a Paxos state machine atop something called a tablet. Spanservers store data in B-trees using a composite key alongside a timestamp and a value.

Cloudant Labs pointed out two areas in which Spanner lacks, in their blog post:

Notably, Spanner does not yet support automatic handling of secondary indices. Further, it does not support "offline" access with later reconciliation (ala CouchDB).

Google claims that Spanner is the first globally replicated and scalable, ACID database while NuoDB has also patented their solution which appears to achieve the same from the patent description. How does this change the debate around NoSQL vs NewSQL for your product/project implementation?

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Infrastructure topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter