Bio Rich is the author of Clojure and designer of Datomic (http://datomic.com/ ) and has over 20 years of experience in various domains. Rich has worked on scheduling systems, broadcast automation, audio analysis and fingerprinting, database design, yield management, exit poll systems, and machine listening, in a variety of languages.
Software is changing the world; QCon aims to empower software development by facilitating the spread of knowledge and innovation in the enterprise software development community; to achieve this, QCon is organized as a practitioner-driven conference designed for people influencing innovation in their teams: team leads, architects, project managers, engineering directors.
1. We’re here at QCon San Francisco 2012 and Rich gave a talk yesterday on deconstructing the database. My first question is basically can you introduce yourself to our audience, just kind of tell us who you are?
Sure; as you said, I’m the author of Clojure and more recently of Datomic which is a new database and prior to that, I was a practitioner for 20 years building all kinds of systems, election systems and scheduling systems and things like that.
Well a bunch of different pressures I think from having used database systems in the past, mostly the fact that using especially traditional databases, monolithic databases is just really complex for application developers and by that I don’t mean, you know, too hard and not easy but that too many concerns were intertwined; and so I was thinking about sort of separating them and accomplishing a few things; one giving programmers more of an information model about how to use data, the other is to bring declarative programming the kinds of programming that for instance SQL servers can do into the application layer because we don’t really have those kinds of tools in our applications and then to make a system that was a database that was a system that used other components to build a database and could leverage storages and things like that.
Sure; so that’s an example of sort of that systems approach, what you saw in Eric Brewer’s keynote he talked about sort of the monolithic approach which is usually where the relational world comes from and then the systems approach which is sort of the way the NoSQL movement is going. So Datomic is very much in that systems model of saying things like query and transaction coordination and storage should all be independent services that you sort of tie together to make a database. So by taking that approach you’re actually able to say or we’re actually able to say, you know, we shouldn’t be in the storage business; there are already some very high quality storage engines with really nice properties and different tradeoffs like Dynamo DB or Riak or even SQL databases and so by treating storage as a component and as a service, you can take the same database logic and host it on top of different storage engines; so that’s what we do.
4. And you do a lot of abstractions like that; you’re abstracting in a lot of different ways. So Datomic takes advantage of immutability treating basically you talked about facts and then what changes is novelty; so can you kind of tell us about the data model?
Sure; so that’s what I was talking about sort of by the information model; I think that traditionally databases have been what I would call a place oriented model where you say whether it’s in a document or in a row inside, in a column in a row there’s a place for Joe’s email address and if Joe has a new email address, you go there and you replace one with the other. And that’s not the way we used to manage information before we had computers; you know, we didn’t have cubby holes where we kept facts and replaced them; we kept ongoing records, ongoing logs. So an information approach says if you change your email address, you haven’t actually modified an email address and turned it into another one; it’s just that a new fact that you’re now using this email address and the fact that you were using another email address in the past isn’t something that should go away, and so an information model says just keep accumulating new facts. When you do that you realize that all of the old facts can stay and they never get changed in place and therefore you can use data structures that are immutable to represent them and that gives you all kinds of architectural benefits.
Michael's full question: We talked about ACID transactions and that refers to the properties that ensure reliability essentially, but you yourself have noted that there’s problems with scaling assets; so unlimited scaling writability. Can you quickly tell us what ACID is just for the audience and then my question is how does Datomic manage these kind of tradeoffs?
So ACID is the acronym - Atomic, Consistent, Isolated and Durable - and it was applied to databases and it’s a valuable property. I think the problem is less with ACID than with people not realizing that it’s a choice you make and it has tradeoffs and with the fact that it’s typically always been embodied in a monolithic kind of server; and Datomic is actually ACID on the transactional side and does take some of those tradeoffs; it does take the tradeoff for instance about write-scalability and write-availability but it’s not monolithic. So that’s sort of where it sits differently that you have traditional ACID databases that were monolithic and you have NoSQL which is not ACID and eventual but very distributed and there’s a whole spectrum in between and Datomic actually targets any in-between point between those two. It says you really can make independent decisions about writes and in the case of Datomic we say we like ACID transactional writes and a different decision about reads especially when they’re no longer co-located and if you use immutability you’re able to say I can put my readable data inside a distributed storage engine like Dynamo or Riak and get those properties on the read side - high availability, elasticity, redundancy and distribution.
Michael's full question: Yesterday you mentioned Eric Brewer he talked about CAP Theorem and he’s the man; basically he kind of clarified his position on CAP Theorem and saying that it’s only a problem for 100% consistency and 100% availability. He kind of also implied that for real world problems you have no other choice than BASE basically if you want to scale. How does Datomic address this problem?
I’m not sure that’s really what his intention was; I think what he was trying to say was that there’s much more spectrum and that in fact you can get a lot of properties almost all the time, if you become aware of when you’re in a partitioned state; and then the question becomes when you’re partitioned, what do you do. Datomic is actually fairly traditional in its approach to the write side and saying it has the normal availability semantics associated there which means it’s not available in the case of partitions; so it’s a tradeoff so we make that tradeoff. What’s different about Datomic is that because you don’t co-locate your read services with your transactional services, like you don’t go through the transaction coordinator to read and you don’t need to read inside transactions to see consistent data, you now can make an independent decision for that; so I don’t know if you saw Mike Nygard’s talk that he had - Ten Loopholes in CAP - I mean not that they’re actually loopholes but there’s different ways of thinking about consistency that allow you to retain it most of the time.
7. Some of them were pretty funny, they weren’t loopholes. So I mentioned abstractions, Datomic makes several; one is the abstracting of the query language out of the database basically; can you talk about the query engine for us?
Sure; so again, the idea is storage, transaction coordination and query are all sort of a la carte pieces and once they are you’re able to say query doesn’t need to sit in a prized server; query is something that can reside in more than one server and in fact those servers can also have other code running on them. So what we’ve done is say the query engine is something that you can incorporate into your application server and if your application server has access to storage which is the architecture of Datomic, then you get scalable query because as you have more load and you have more questions to answer you can just add more servers; so that elastic query capability is there. In terms of what the query language is, is it’s a JVM library that provides Datalog to your application; so Datalog is a query language that has the same power as relational algebra but it’s a little bit better suited to programming language use and to application both to databases and to data structures in memory. So when you get the query language and the query engine, you can apply it both to Datomic databases and to in-memory data structures and combinations of the two, query collections in memory.
So the idea behind Datalog is it has a family relation to Prolog but it’s oriented towards query; it's sort of a subset of Prolog that can’t do everything Prolog can do but as a tradeoff for that it takes a set orientation to the way it resolves queries and therefore is good for query; and you substitute the knowledge base of Prolog with an actual database and so that’s how it works; so Datomic is the knowledge base that its Datalog operates upon.
Sure; so there’s a single writer transaction coordinator in the system and then that’s all it does; all it does is transaction coordination and all transactions are serialized and resolved in memory so you get all of the ACID semantics of transactions; and what constitutes a transaction is just new facts. So submitting a transaction is saying these are the new things that have happened in the world; you’re not really saying change this place, update this table; you’re not saying here’s a revision of this document; you’re saying you know, just very small things like here’s Joe’s new email address. And then, because they’re called out like that, it’s very easy to see in a transaction what’s changing; it’s not sort of a side effect that’s happening; and therefore you can flow that change, those changes around so they go to the transactor and they get stored in the database but they also can get broadcast out to all the application peers we call them; so they get to see what’s new and so the combination of what’s new and what’s already been indexed in storage gives you a complete view of the world.
Well you know, we have users now and they have things that they want; certainly adding the storages is a big deal and we just recently added Riak as a possible backend and that is certainly something we’ve seen customers want. We have the Dynamo option if you want to run in the cloud, Dynamo DB but if you want to run on premise, Riak is a similar kind of technology but you can run it on your own hardware so things like that. We’re always looking to do new storages and enhancements to query and more language support.
It’s a partnership between myself and Relevance which is a consultancy and so I work with guys from there and so we have a team that’s working, it’s not all me; but Stu Halloway is definitely my prime partner in developing it; he does all of the hard parts.
No, I mean I encourage everybody to try Datomic; there’s a free version there that’s actually quite capable, a server and it can support a couple of peer servers and I think everybody should try it out; also try it inside their own programs for just in-memory stuff because it’s quite useful.
Michael: Thanks for coming by today.
Sure; thanks for having me.