Bio Justin Sheehy is the CTO of Basho Technologies, the company behind the creation of Webmachine and Riak. Most recently before Basho, he was a principal scientist at the MITRE Corporation and a senior architect for systems infrastructure at Akamai. At these companies he focused on aspects of robust distributed systems, including scheduling algorithms, language-based formal models, and resilience.
The Erlang Factory is an event that focuses on Erlang - the computer language that was designed to support distributed, fault-tolerant, soft-realtime applications with requirements for high availability and high concurrency. The main part of the Factory is the conference - a two-day collection of focused subject tracks with an enormous opportunity to meet the best minds in Erlang and network with experts in all its uses and applications.
Sure, I’m the CTO of Basho Technologies. Our product is Riak, which I think we’ll talk a little more about as well as a great deal of other open source software we've released. Prior to Basho I spent time doing research for the US government and I spent several years at Akamai.
Riak is a database and it's inspired by Amazon's Dynamo system, though it certainly is its own system; it is a distributed, scalable, fault tolerant system, meaning that it's fundamentally about taking advantage of many computing systems. You can add capacity or throughput by adding computers and, if any of your individual systems should die or be taken out of commission, you don't really have to worry about that.
The Dynamo Paper's publication in 2007 was really one of the sparking moments that created what's now called the NoSQL tradition, even though it's a strange name for a category. The Dynamo Paper was really the people of Amazon taking a significant amount of established prior art (things like Gossip protocols and vector clocks and things that had been around for decades) and showing that one particular way of assembling this collection of ideas could produce a distributed system with some very desirable properties.
In Amazon's case, it was in order to be able to have a shopping cart that you could always put things into, but the general properties that that enabled were very much of general interest. That paper really set off a bunch of people realizing that real businesses running the most important part of their business were sometimes making the choice to not use the default database of the day, to use Oracle or MySQL or whatever. Those systems are good, those systems solve many problems but that they are not the only and sometimes not the best way to solve your business problems.
One thing that is a common misconception people have after reading the Dynamo Paper is that it's a direction to implement the system and it's really not. When someone actually goes and tries to build a Dynamo-like or Dynamo-inspired system, they quickly discover that this is a description of how these people chose to put these things together and what they used. But it's very far from an implementer's manual.
Even just the Dynamo specific parts are very dramatic in differences. There have been a number of Dynamo-like systems developed over the past few years, each of which has had to design and implement large portions of even just the Dynamo-like sections on their own. Because Dynamo tells you what some very good design decisions are but it doesn't show you how to implement the system. Even just the Dynamo portion you have to do a lot of design work, just to implement that.
That's why Riak is different from Voldemort, is different from all the other things even in their Dynamo elements. But then, to make that work, one of the things that we made as a choice in Riak was to make the layers around the Dynamo elements in our software be very simple, so that we've been able to do things like implement new storage engines without having to change any of the rest of the Dynamo related code.
We’ve changed more than once and provide options to our users so that we can innovate in storage, we can innovate at the client API level and none of these required changes to the Dynamo-like parts of Riak.
Yes, certainly. Our first and most visible customer initially was Mochi Media using it for a very large Ad network for flash games. There are others - Comcast Interactive Media uses Riak and Mozilla is beginning to roll out Riak and there are several others. We're running not only with users but with many customers at this point with Riak.
I think one of the differences between a key/value store and what a lot of people call document databases these days is whether or not there is any ability to examine inside the values being stored. Most of Riak treats the value stored in it as entirely opaque binary data. You store key/value and it puts the values in. But there are aspects of Riak that allow you to have Riak do some programming or processing for you on your data.
Obviously once you start specifying functions that will operate on your data you are then implicitly assuming if not schema then at least some preconditions on your data. For instance, one of the things you can do in Riak that’s well beyond basic key/value access is MapReduce processing over the data that's stored in Riak. While Riak's "get" and "put" and "delete" sort of operations don't care about what's in the value, to run a MapReduce job the functions running the "map" and "reduce" phases obviously have some expectation of what data they'll be reading.
Different people can write those with different amounts of tolerance. You can write one that will entirely fail on data it doesn't understand or you can write such functions that simply ignore data that they don't understand. Riak doesn't really make that choice for you.
That's a very good question. One of the things that we modeled the initial applications first using Riak around was the notion of the web as a very capable distributed system itself. One of the defining features of the World Wide Web is that of links, of hypertext. One of the most common operations for expressing relationships between data and Riak is by including links in that data.
One of the ways to do this is through an HTTP link header, which a draft standard, but it's not the only way. This is something that you can configure, how links are structured in your data. One of the ways to access data in Riak is to issue a query in terms of links so you can say "Given this starting document, follow all the links from that document that match some tag specification".
Then you can issue this with multiple stages. You can say "From each of those follow all the links off of each of them that match some other tag specification" and so on. So you can find the leaves or even the middle of this sort of a tree. The thing that’s very easy to issue this sort of link query is you can do in a very easy HTTP GET. It's a very simple syntax; you can do it from any language.
What is actually happening under the hood when that gets run is each of those link phase queries at each level is getting transformed into a MapReduce job. Link walking is really just a very simple subset of the things you can do with MapReduce. It turns out to be a very useful one. There are lots of people that don't ever have to write their own map functions.
There are 2 primary interfaces. One of them is over an HTTP connection and for that we use Webmachine, which is also open source software that Basho created and released to provide a very well-behaved, very standards compliant HTTP interface. One of the nice things about that is that everything out there already knows how to work well with HTTP. Every programming language has libraries for it, caches work well with it, and it’s very ubiquitous, very scalable, and very robust.
That's one of the 2 ways. For clients that require higher throughput, that are really trying to use Riak as a very high throughput database, using HTTP is not optimal because it's a very verbose protocol. We also expose the exact same underlying interface over a TCP protocol buffer's interface, using the Google created protocol buffer serialization and a simple protocol that we wrote, using that serialization over TCP.
For people that don't need all the nice conveniences of HTTP, that would rather have just more data flow over the pipe, they can use protocol buffers. In fact you can switch back and forth between these interchangeably, you can put data in through protocol buffers, and you can take it out using HTTP and vice versa.
Yes, that’s correct. Riak is almost entirely Erlang.
There was a really natural choice because especially when you look at the Dynamo model, where they talk about all these operations where to get a value you'll send messages to multiple other parties, then you'll wait through various phases for responses of different classes to come back and the basic building blocks to do that kind of messaging and to do that kind of more complex state machine are there for you out of the box for you in Erlang.
Obviously, you can write this kind of program in any language. People have done it in Java, you can do so, but between Erlang the language (having it's very nice messaging simple semantics) and OTP the support libraries, (having incredibly powerful constructs like finite state machine libraries) it really gave us tools to both write the software quicker and to believe that some of the most complex parts of it were far more likely to be robust right out of the box.
We refer to it as an open core model and that's very different from a dual licensing kind of situation, where Riak, everything you need as a developer to run and use Riak in full is entirely open source available under the Apache 2 license. We encourage people to use it and do whatever they like with it.
There are some additional applications and some additional code that integrates well with that core that we've written that is part of our enterprise offering. When you buy the enterprise version of Riak you get the open core and you also get a number of auxiliary capabilities. They're usually things that developers don't require to do their prototyping but that a business will often insist on to have the revenue depend on this.
Things like monitoring via SNMP or JMX, a web administrator's console, long haul multi data center replication - these sorts of things are on the enterprise product. They all rest on top of the exact same core open source product that's available to everybody.
We've learnt quite a bit from several of our clients, first being Mochi Media. They really helped us with making Riak very flexible operationally for people who thought about operations differently than we did. Comcast similarly has been putting Riak through the paces in a few different ways. One of the more interesting studies that happened recently is the Mozilla team doing some of their Test Pilot work, they actually blogged about this.
Daniel on that team was one of the people of evaluating how to manage and store these large amounts of very short term data that arrives in these bursts during their test pilots and what data stores to use. They analyzed a number of different data stores: Riak, Cassandra, HBase and other things and I think for various reasons including the robustness of the software, the type of interface using HTTP, our whole model, they ended up choosing Riak.
But I think one of the things that that really showed us is that this is very little about beating all these other sorts of systems and much more about helping people realize (much as the Dynamo Paper helped some people realize) that different business needs require different systems.
Often, a complex business will end up with a hybrid system. We would not find it surprising. In fact this is the case at Mozilla, at Comcast, at Mochi Media and other customers. People often run more than one system for storing and retrieving data. One of the things we really appreciated about Mozilla's approach is they analyzed their needs for this particular project and application. We helped them initially learn how to set up clusters and so on, but they learnt quickly that it was easy and they didn't need our time.
They did some very aggressive, high throughput, many day benchmarks and they learnt that for the needs of that application, Riak was the right data store. It's been very rewarding to help people be more rigorous about their business needs. It used to be you pick Oracle if you have an Oracle license or MySQL otherwise and you build whatever you can build around that.
One of the most rewarding things about this is we've been giving people the tools not only to solve problems differently but to evaluate their problems, to figure out which solutions really work for them.
One of the things that we learnt when building Riak was that as I said before, we built this distributed system at the core and a very simple to use key/value store around it. It's long been understood wisdom in most of computing that if you need a big distributed system to solve whatever your business problem is, it probably has to be a one-off solution because that's the nature of these things.
We've started to chip away at that and believe that's not quite as purely true as it once was. Right now, we're building our next product Riak Search, using the exact same distributed system core and fundamentals that are in Riak. You pare away all of the key/value storage and database aspects.
But we are using what we learnt in building Riak and how to solve some of these distribution problems and how to solve some of these operations problems of managing machines failing, machines being added and all these things dynamically, and building something that people can use in the same ways that they use Lucene or Solr today but with the same operational properties and scaling properties of Riak.
Search won't be the last thing that we do that way. It's just the next step.
There are really 2 immediate ways to compare different data storage systems. You can compare them by data model or in some cases by their distribution model. In data model terms, Riak fits into either in the key/value or the document store category, depending on which way you look at things and there are a number of other things like that. In the key/value area, there are things that have been around for a very long time, like Berkeley DB and so on.
That's a very simple, very powerful data model, but there are a number of other systems with different data models that are very rich and powerful. MongoDB has this deeply nested document model and things like Cassandra have this column data model that's more about naturally commutative data. One of the first things someone should look at when comparing these systems is which of these data models fits, whether it's the key value document model or our column commutative sort of model or a schema driven traditional database model - all these are roughly equivalent in nature.
You should look at them all separately and see which of those fits your problem the best. That's one of the primary ways by which people, for their own comparison purposes ought to narrow the field and look at which data stores suit their problems in terms of data model.
Another fundamental thing that differentiates some of these data stores is their distribution model. There are really a few different kinds of models. You can be purely single server and you can get a great deal of simplicity and efficiency out of a system that really only runs on one computer. A good example is Redis that is now preparing to add clustering. But in its current form is very much a single system, single thread even data store and it's fantastic.
It's a very powerful, very fast way to manage your data, but it's not a distributed system. It's great at what it does, but then if you look at things like Riak and Cassandra and Voldemort and so on, all are very different in some ways but all share in common that they're fundamentally distributed systems.
They add more to what they can do by adding more machines and they use the capacity and the processing power on a cluster basis and not on a single machine basis. Between those 2 there are other aspects. What you can do with MySQL and many other systems is build, what's not a cluster based distributed system, but you can do things like master-slave replication.
In some cases, like CouchDB, you can do this peer-to-peer master-to-master replication. So, there are lots of different distribution models. Much like the people writing an application need to look at the data model to decide which things might fit their needs, the people making operational and business decisions should look at what their needs are for robustness, for scaling, for cost and figure out what are the distribution models of different things for their needs.
It's really between those 2 things you can often narrow down which of these different systems you are most interested in.
Until the past few years it was really the norm that when you started any project that was going to be storing and using data of almost any kind, your first step was to reach for your standard table-based, schema driven, relational database, whether that's MySQL or Postgres or Oracle. Those technologies have solved a lot of problems very well for many people.
For one thing, NoSQL has nothing to do with the SQL language for data access. That's why the name is a strange fit, but it's the one we've got. What I think is really the defining feature of this movement or phenomenon is not about data access languages like SQL and it's not even about any specific technological approach.
Some people think it must be about document databases, it must be about scaling big or something like that, but I think it's about the growing recognition that all the parties in this area are helping to put forward. That you should in fact, just like you should choose your web framework, your programming language and all these things based on the business needs of what you're building that the same should apply to your database.
And that the ways that you store and manage your data there are a lot of choices. The NoSQL movement is really about making it more obvious and more available for people to make business decisions about what they are building because there is this variety is coming up again. Now you can choose, you can use a column database, you can use Riak, a Dynamo-like key/value store.
You can use all these things if they fit your needs better than Oracle or MySQL or whatever you might have chosen by default before. It's not about replacing any other older technology; it's about making it more obvious that people should analyze their data storage needs with the same level of rigor that they analyze the rest of their needs when building software. They should have choices available to them that meaningfully fill those needs.
I actually think it’s very interesting because it’s true that some of the biggest websites have pushed a lot of this forward, but I don’t actually think that’s because these technologies are more suited to the web than they are to other things. I think that the whole NoSQL movement is about providing choices to everybody that needs software and needs data storage and so on.
The reason we’ve seen it first in big web organizations is because those companies (whether you’re talking about Amazon or Twitter and so on) have very different businesses but they have in common a few things. One of them is web operations. It is about managing your entire system, which includes your data.
In responsiveness to the kinds of behavior that you see from the users of the web and the entire Internet using your web service exposes some of these business needs more immediately and more deeply and forces you to make some of these decisions a little earlier than people in some other industries are.
It’s also true that many web companies are more willing, whether it’s trough competitive pressure or attitude or something else, to make newer riskier choices earlier than people in other industries. In some cases, the adoption is out of need.
Amazon needed a way to have an always writable shopping cart and that’s what started some of this. One important lesson that they had clearly learned and that everybody else could get is "Don’t ever stop a customer from giving you money!" But everybody’s got all these different needs and some of them are being driven to NoSQL solutions through that analysis.
Some of the web businesses, the reason they go there first, even if their needs are really equivalent to someone writing entirely different sorts of software because, you tend to see much quicker uptake of new languages, new everything. It bears in mind that you’d see new databases as well getting picked up sooner by web companies.
Thank you. It’s been my pleasure.