CockroachDB: A Scalable, Geo-Replicated, Transactional Datastore
The team behind CockroachDB, an open source datastore project, has recently announced its initial alpha version. Inspired by Google’s Spanner project, CockroachDB aims to address the current lack of an open source, scalable, geo-replicated, ACID compliant database.
Traditionally, NoSQL datastores have supported multi datacenter configurations through the use of asynchronous replication between datacenters. This is the recommended approach in MongoDB and it is the only approach available in Riak, Cassandra and HBase. However, asynchronous replication prevents consistency guarantees between datacenters. Moreover, most of the first generation NoSQL datastores failed to support arbitrary cross key transactions. Recently, new systems such as FoundationDB and HyperDex have been designed with cross key transaction support. However, these systems caution against multi datacenter configurations.
Against that backdrop Google announced their Spanner system in 2012 which broke new ground supporting both transactions and multi datacenter consistency. While the Spanner paper inspired many, it wasn’t until the CockroachDB team began on their implementation that an open source project tried to replicate Spanner’s challenging mix of features.
ACID transactional semantics and versioned values as first-class features. The primary design goal is global consistency and survivability, hence the name. … Cockroach implements a single, monolithic sorted map from key to value where both keys and values are byte strings.
CockroachDB’s transactional features are provided in two ways. Kimball explains that
If all keys affected by a logical mutation fall within the same range, atomicity and consistency are guaranteed by Raft; this is the fast commit path. Otherwise, a non-locking distributed commit protocol is employed between affected ranges. Cockroach provides snapshot isolation (SI) and serializable snapshot isolation (SSI) semantics … SSI is the default isolation; clients must consciously decide to trade correctness for performance.
The non-locking distributed commit protocol is still a work in progress but Kimball references papers from The University of Sidney, Yale, and draws special attention to a paper from Yahoo! as potential options for implementing SSI semantics.
The current alpha version provides a very minimal subset of the desired functionality. Cluster initialization and joining, gossip network, and a basic key-value REST API are currently supported but there is no raft consensus, range splitting or transactions. The project is written in Go and they are currently looking for contributors.
Re: Atomic clocks
One example of this can be seen in how the two systems deal with out of band communication (e.g. communicating via a message queue). Take the situation in which one client writes a value, then sends a message to another client which in turn updates the same value. In Spanner there is no way for the second write to occur before the first write because the atomic clocks allow for a system wide real time ordering. CockroachDB does not provide this strong guarantee and instead passes a wait time value back to the client so that it can wait an appropriate amount of time if it will be communicating out of band.
More can be read about this in the CockroachDB spec.