Riak NoSQL Database: Use Cases and Best Practices
Riak is a key-value based NoSQL database that can be used to store user session related data. Andy Gross from Basho Technologies recently spoke at QCon SF 2011 Conference about Riak use cases. InfoQ spoke with Andy and Mark Phillips (Community Manager) about Riak database features and best practices when using Riak.
InfoQ: What are the primary use cases for using a Riak database compared to a relational database as well as compared to other NoSQL databases?
Basho Team: Riak is purpose-built for soft, real-time systems where availability is a priority. Use cases include (but are not limited to):
- Session Storage
- User Data storage
- S3-like services
- Cloud infrastructure
- Scalable, low-latency storage for mobile apps
- Critical Data Storage and Disaster Proof Medical Data (see Denmark Healthcare use case)
- Building Block for custom-built distributed systems
We tend to see people switching to Riak from MYSQL and Oracle when they a) were forcing a key/value data model into them, b) need to reduce costs, c) need to get away from a fragile scale-out model based on sharding or d) all of the above.
As far as other NoSQL DBs, Voldemort is the closest thing to Riak in terms of functionality. Cassandra is somewhat similar, but it's more suited for applications that don't require the flexible consistency that Riak offers. We see a lot of people switching from MongoDB to Riak to reduce operational complexity. What we find is that a lot of companies launch on MongoDB or Redis but will switch to Riak when costs of operating that system at scale become prohibitive. (Keep in mind that there is typically some reworking of app design and data model that needs to be done to facilitate this, but this is well worth it in the long run for Riak's stability.)
InfoQ: What type of data persistence and data management patterns does Riak support?
Basho: We place a lot of importance on persistence and predictability. To that end, we support pluggable backends that are suitable for different use cases.
- The default storage engine is Bitcask (which we wrote) and this is purpose built for low latency data.
- As of 1.0, we support Google's LevelDB, and this is used as the backend for our secondary indexing component.
There are also several other backends that ship with Riak, and some people have also written custom backends for their use cases (as we strive to make the API easy to work with). And you can use more than one backend in the same cluster; i.e., bitcask for sessions, LevelDB for data that is indexed.
Most important to remember is that Riak will remain performant even with datasets that are larger than RAM.
InfoQ: Can you talk about some limitations about the Riak databases and what use cases it's not the best solution to use?
Basho: Applications that require ad-hoc querying and heavy analytics tend to be less of a good fit for Riak. Since we are a key/value store at the core, applications that require ad-hoc queries and/or heavy analytic processing can be difficult to implement on top of Riak. Our main focus is predictability and scale, and there are some tradeoffs that have to be made with data model and queryability to stay faithful to this focus.
That said, we plan to enhance Riak in various capacities to address these use cases in 2012. Riak already exposes deeper query possibilities via our MapReduce, Secondary Indexing, and Search components, and we'll continue to make these more robust in future releases.
InfoQ: What are some best practices and gotchas that the application architects and developers should keep in mind when working on applications that access the data stored in Riak databases?
Basho: Riak runs reliably on both bare metal and cloud environments. Most clouds have relatively small i/o capacity to bare metal, so capacity planning should be stressed when deploying on something like AWS or Rackspace.
Modeling applications as keys and values can be difficult for architects who are used to the relational model. Spend a lot of time thinking about your data model and access patterns. Riak may not be a fit (and we're not afraid to tell you it's not). But if it is (which tends to be around 70% of use cases in our experience), you'll be delighted with that it offers. (One of our users once stated that, "Just about every application at scale becomes a key/value store.").
MapReduce is a powerful tool in Riak, but it's not meant to be run on all your data at the same time. MapReduce in Riak is meant to run over small key ranges and should be used to serve data for real time requests.
InfoQ: What is the current tool support for data modeling and application development using Riak database?
Basho: Aside from there being client support for virtually every major programming language, there are also various ORM libraries and frameworks for languages like Ruby, Python, PHP and Node.js. And there are various open source tools that let you inspect and tweak the data stored in Riak via a GUI. We'll be releasing more code to make this even easier in future releases of Riak.
InfoQ: What is the future road map of Riak database in terms of new features?
Basho: Moving forward you'll see a lot of work from Basho focusing on usability, core stability, and better support for globally distributed data storage. We will also continue to expand the query-ability and flexibility available to developers who are using Riak. Speed is also high on our list of priorities. We know Riak isn't the fastest database available, but it won't be that way forever.
Also, riak_core, the framework that powers Riak's distributed capabilities, and riak_pipe, the framework that powers Riak's MapReduce, will continue to be developed and more extensible.