Basho, the creator and developer of Riak, a highly available, distributed, NoSQL database, has announced the release of Riak 2.0. This new version, over a year in the making, contains a number of significant updates including a redesigned search implementation, new distributed data types, optional strong consistency guarantees, security features, and a number of other improvements, as follows:
The new search implementation, codenamed Yokozuna, is a complete rewrite from the search implementation in Riak 1.4. The new design moves away from building search into the core of Riak and instead leverages the power of Solr, a powerful distributed search project from Apache. Where as the legacy search system provided a Solr like api on top of Riak’s key-value interface, the new search system acts as an integration layer between Riak’s distributed key-value store and the full power of Solr itself. The Yokozuna project contains a list of helpful resources for those looking for more technical details.
The 1.4 release of Riak introduced an implementation of an eventually consistent counter. Riak 2.0 builds on this work by adding support for four new eventually consistent data types; flags, registers, sets and maps. All of these data types are built upon a topic of research in distributed systems known as Conflict-free Replicated Data Types or CRDTs. For those interested in the research underpinnings of CRDTs, Basho developer Chris Meiklejohn has compiled a CRDTs reading list.
Riak is well know for its choice of availability over consistency in the CAP tradeoff. However, Riak 2.0 now provides support for strongly consistent operations, allowing the choice between eventual consistency and strong consistency on a per-key basis. This new feature is built on top of the Riak Ensemble project, a multi-paxos implementation in Erlang, and supports four atomic operations; get, conditional put, conditional modify and delete. A technical deep dive can be found within the Riak Ensemble repository.
Authentication and authorization are a critical part of any data storage system. Prior to Riak 2.0, the security best practice consisted of deploying Riak on a trusted network and restricting access using firewall/routing rules. To address this limitation, Riak 2.0 now supports restricting access to a wide variety of Riak’s functionality, including accessing, modifying, and deleting objects, changing bucket properties, and running MapReduce jobs.
Riak 2.0 introduces dotted version vectors as an improvement to its version vectors implementation. Previous versions of Riak are susceptible to a problem called sibling explosion
where retried or interleaved writes can cause the number of sibling values to grow without bounds. Dotted version vectors address the sibling explosion
problem by capturing additional causal information about writes, which allows duplicate values to be identified and removed. This, in turn, allows the number of sibling values to be bounded to a number proportional to the number of concurrent updates.
Prior versions of Riak used an Erlang specific syntax for its configuration files. However this syntax is unintuitive for operators not familiar with Erlang and is difficult to work with using automated deployment tools. To address these issues, Basho created the cuttlefish project which Riak 2.0 uses for its configuration. The project supports a simple key-value configuration syntax. At startup, cuttlefish reads in a file in the new syntax, generates a file in the old syntax and then starts the actual Riak process with the generated configuration.
Riak 2.0 is available now under the Apache 2 license and comes with a rich set of documentation.