Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News CockroachDB: A Scalable, Geo-Replicated, Transactional Datastore

CockroachDB: A Scalable, Geo-Replicated, Transactional Datastore


The team behind CockroachDB, an open source datastore project, has recently announced its initial alpha version. Inspired by Google’s Spanner project, CockroachDB aims to address the current lack of an open source, scalable, geo-replicated, ACID compliant database.

Traditionally, NoSQL datastores have supported multi datacenter configurations through the use of asynchronous replication between datacenters. This is the recommended approach in MongoDB and it is the only approach available in Riak, Cassandra and HBase. However, asynchronous replication prevents consistency guarantees between datacenters. Moreover, most of the first generation NoSQL datastores failed to support arbitrary cross key transactions. Recently, new systems such as FoundationDB and HyperDex have been designed with cross key transaction support. However, these systems caution against multi datacenter configurations.

Against that backdrop Google announced their Spanner system in 2012 which broke new ground supporting both transactions and multi datacenter consistency. While the Spanner paper inspired many, it wasn’t until the CockroachDB team began on their implementation that an open source project tried to replicate Spanner’s challenging mix of features.

In the draft specification, Spencer Kimball, who worked on the Colossus file system at Google, describes CockroachDB as supporting

ACID transactional semantics and versioned values as first-class features. The primary design goal is global consistency and survivability, hence the name. … Cockroach implements a single, monolithic sorted map from key to value where both keys and values are byte strings.

CockroachDB’s transactional features are provided in two ways. Kimball explains that

If all keys affected by a logical mutation fall within the same range, atomicity and consistency are guaranteed by Raft; this is the fast commit path. Otherwise, a non-locking distributed commit protocol is employed between affected ranges. Cockroach provides snapshot isolation (SI) and serializable snapshot isolation (SSI) semantics … SSI is the default isolation; clients must consciously decide to trade correctness for performance.

The non-locking distributed commit protocol is still a work in progress but Kimball references papers from The University of Sidney, Yale, and draws special attention to a paper from Yahoo! as potential options for implementing SSI semantics.

The current alpha version provides a very minimal subset of the desired functionality. Cluster initialization and joining, gossip network, and a basic key-value REST API are currently supported but there is no raft consensus, range splitting or transactions. The project is written in Go and they are currently looking for contributors.

Rate this Article


Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Atomic clocks

    by Igor Kolomiets,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Since the project is inspired by Google's Spanner one question begs to be asked - is there need for atomic clocks?

  • Re: Atomic clocks

    by Benjamin Darfler,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thats a great question. The short answer is no, it does not need atomic clocks. CockroachDB maintains slightly weaker semantics than Spanner and can therefore get away without them.

    One example of this can be seen in how the two systems deal with out of band communication (e.g. communicating via a message queue). Take the situation in which one client writes a value, then sends a message to another client which in turn updates the same value. In Spanner there is no way for the second write to occur before the first write because the atomic clocks allow for a system wide real time ordering. CockroachDB does not provide this strong guarantee and instead passes a wait time value back to the client so that it can wait an appropriate amount of time if it will be communicating out of band.

    More can be read about this in the CockroachDB spec.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p