BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Rich Hickey on Datomic, Data Storage, Functional Programming and Immutability
Recorded at:

Interview with Rich Hickey by Peter Bell on Sep 10, 2012 | NOTICE: The next QCon is in San Francisco Nov 3-7, Join us!
19:42

Bio Rich is the author of Clojure and designer of Datomic (http://datomic.com/ ) and has over 20 years of experience in various domains. Rich has worked on scheduling systems, broadcast automation, audio analysis and fingerprinting, database design, yield management, exit poll systems, and machine listening, in a variety of languages.

Software is changing the world; QCon aims to empower software development by facilitating the spread of knowledge and innovation in the enterprise software development community; to achieve this, QCon is organized as a practitioner-driven conference designed for people influencing innovation in their teams: team leads, architects, project managers, engineering directors.

   

2. So, the first thing I wanted to ask is that you have got a keynote “The Value of Values”. And it seems to be about introducing some of the benefits, if I am an object oriented programmer and I heard about this functional stuff people mention Erlang and Clojure and Haskell, why should I care? Does the value of values try to address that?

Yes it does, it definitely does and the idea is actually to step away from some of the lingo associated with functional programming, and look at very specifically what we are trying to do with computers and what we are trying to do in building information systems, and how using values to do that will help make us more effective, and then the talk focuses on a few different things, it focuses in the small, sort of inside the program, so talking about object orientation inside the program, what kind of benefits do you get from using values instead of more stateful things, and that would be easier concurrency, a better ability to communicate things, an easier time moving components of your system into other processes and then the talk moves on to talk about values in the large where I think programmers have a much better understanding of using them, for instance RESTful interfaces and passing JSON around, and HTTP for inter-process communication is something that people understand and they understand the architectural value of it. So one of the points of the talk is to challenge people to say, in the large you see these benefits, but in the small you are making different decisions, and should you be, because the same benefits are present inside your program if you start programming with more immutable data structures and things like that.

   

3. Do you see a lot of – as people start to think about the functional ways of programming - do you feel that the big change is going to be that people are going to start to use functional languages more or that you are going to see people programming in a more functional style in traditional object oriented languages like Java?

I think you are going to see both: certainly the people who have paid attention to the advice of the Java experts have been advised to use immutability as much as possible for a long time, from the experts in Java, so, following that advice has always been a good idea and still is, but I think a language is less about what it absolutely lets you do, because most languages eventually let you do anything in the end, as opposed to what they make idiomatic and so choosing a functional language makes a lot of these approaches and techniques idiomatic and therefor more concise and easier, and so I think that is where the trade-off lies, I think you are going to see both.

   

4. Great, talking about functional languages, and specifically within Clojure and you did a presentation on Reducers, talking about collection processing, why are Reducers important to programmers?

They are important because it’s a path to parallelism, for Clojure users in particular, so the challenge for most programming languages now is how to take the techniques that you have been using, and it actually doesn’t matter whether it’s functional or imperative, in both camps most of the techniques and approaches that have been used have been sort of iterative and sequential in nature, even functional programming languages like Haskell still primarily are paradigm based around recursion, and around inductive data structures and sequential processing, so how do you allow people to retain as much of the programming model they are familiar with, and yet take advantage of the fact that their computers now have more processors and therefor the path to making the programs faster is to leverage those processors by subdividing the computations in parallel and so the Reducers is a technique for Clojure users to obtain parallelism, by slightly modified versions of map and filter and some of the other classic functions that they use that are compatible with Java’s Fork/ Join framework which is a parallelism framework, that will let you get speed ups based upon having more cores, without changing the shape of your program.

   

5. How can an immutable data store even work?

Peter's full question: Nice, and so we’ve been talking primarily about programming languages, object oriented versus functional, and this focus on trending tools and visibility. But I think as you pointed out is something you see in most languages as becoming a best practice. That brings me to Datomic which seems to be taking the same sense of immutability and moving it to the core of immutability, certainly in most web applications, which is the database. How can an immutable data store even work?

Yes, it works actually quite similarly to the way we do immutability inside programming languages which is when we need to change something instead of changing it in place, we allocate new space, and we put the novelty in the new space. And that same technique that you use in memory can be used on disk and there are tremendous benefits from doing that like the ability to cache extensively and again ,some of the architectural advantages I talked about in the “Value of Values” talk, apply also in the database space, once you use immutability you have more architectural flexibility to locate parts of your system in different places, or on different machines, or in different processes. So Datomic takes the model that is used for Clojure, sort of extends it to the database, it adds a temporal element to it, which I think again is an important characteristic of information processing systems, that they maintain time, so there is sort of two things that I think are happening: one is treating things immutably, and the other is maintaining time.

More and more now we see businesses seeking value, from data sources like logs, that happened to have kept everything and put time stamps on everything, so now they find all this value in the logs that they don’t actually have in their databases, because the databases are not keeping everything, they are updating in place, and they are not usually time stamped. And I think that the big data pressure on databases is going to be very strong, and people are going to have the same expectations of their core business databases that they actually keep everything and maintain the time so that you can do analytics on it that help you understand your business and what happened.

   

6. Do you run into practical issues, say, with some of the EU data requirements in terms of being able to definitively delete something in the system? How does that work in an immutable store?

So, there you have a notion of immutability meaning nothing will ever change that doesn’t mean you can’t forget something, because they are different things. And so we've definitely heard that requirement and are looking to address it, currently we don’t have that capability, but it’s certainly possible to do, basically when you have novelty you incorporate it in the new index and if you were to have proper permanent deletion, due to regulatory reasons, you can also not incorporate data in the next version of the index and therefor it will disappear. So I don’t actually count that as mutation because no one who has seen something will ever see something change in front of them it’s just future queries will not include that data.

   

7. If you could speak a little bit of that, what is the current architecture around the cloud and how you might address those kind of questions?

Peter's full question: That makes sense. Also I noticed that when I talked to a number of people about Datomic, one of the first question that comes up especially in larger organizations is we were a little confused about the pricing model and we were wondering if we have to be on the plan to run it. If you could speak a little bit of that, what is the current architecture around the cloud and how you might address those kind of questions?

Sure, so in the case of the cloud we definitely heard that, our first supported storage option was Dynamo DB, which is a cloud oriented thing and we had a lot of people say I love the idea, it’s all great, but my company is not ready for the cloud, or regulatory or security regulations, or company regulations prohibit me from using the cloud, how can I use this behind my firewall on premises? So the model was always to support other storages and that is part of the architectural advantage again of immutability is that you can make independent decisions about storage and so we have already added support for behind the firewall storage in SQL databases, so you can back Datomic with any SQL database, as well as Infinispan which is a memory store, and we're in talks with some of the key/value vendors, to support those distributed key/value stores as backends for Datomic so all of that is something that could be run behind the firewall and it’s definitely our intention to support that well.

   

8. Great and just to make it clear for anyone who hasn’t got the background, basically you distinguish the nodes that are used for storage versus the nodes that are used for processing.

Correct. The system, Datomic breaks the traditional database apart into at least three pieces. It has a Transactor which only handles the transaction processing, it doesn’t answer any queries, it doesn’t maintain storage itself. Then we use third party storage systems any of these things, if we are talking about Dynamo, or SQL, or Infinispan or Riak and things like that, in the future can be used as a storage, that’s the second leg, and then query can move into another tier of your application, and it’s highly scalable therefor because any system that is incorporating the Datomic library can answer queries on it’s own because it has access to storage directly and therefore you get elastically scalable query capability because you are not asking a single machine, or a single machine in the cluster to answer all of your questions.

   

9. Does Datomic add to the cognitive load of developers in terms of requiring them to take over some of the jobs that the traditional relational database would be handled for them by that one black box?

Peter's full question: Traditionally with kind of the first generation of NoSQL datastores, it gave developers more flexibility but it also I think required more from the developers, for example if you are trying to deal with and you use Couch and MVVC so you got the module version concurrency control, potentially you are going to get conflicted writes and they have to fix after the fact. Similarly working with Mongo DB if you have a transaction across a collection of documents, probably a bad choice do to it that way, but if you chose to have transactional responsibilities going to your application layer,

No, it does not, in fact it actually sides with the traditional databases in saying transactions are good and consistency is good. So Datomic really sits in between and it says the world is not really black or white it’s not all consistent, all centralized versus all distributed, inconsistent. There is something in between which is again you take this motto of immutability that lets you pull things apart and one of the two things that splits apart is transaction processing and query, and now you can make independent decisions about the scalability and availability modes of those two things. We actually choose a pretty traditional mode for transactions processing, which is that it’s only vertically scalable, and it has a traditional high availability model, but for the query side, we got elastic scalability and for the storage side we get distributed scalability. So you now have two different choices and you can make a different decision about each of those, end up with a system with a different set of characteristics. And for a lot of companies I think that is really a sweet spot, transactions plus query and read scalability.

   

10. Do you see Datomic as an addition to an existing stack or very much as a replacement to it?

Peter's full question: And in practice, some of the NoSQL data stores are often seen as supplemental, we need to throw in a key/value store here, it would be great to keep the data in Oracle but if we have Neo4J you can run certain classes of queries that otherwise you couldn’t perform on a relational database. Do you see Datomic as an addition to an existing stack or very much as a replacement to it?

Oh, it’s definitely you can grow into it as an addition, I think you’ll see people doing green field work where they can make a choice about a new database and a new system and they’ll have a lot of fun doing it straight from scratch in Datomic but particularly for instance the support of SQL storage now you get a very smooth incremental adoption model where you say “I’ve already got this big SQL database, I’m already backing it up, I already have people who know how to maintain it and keep it running, I’m very comfortable with that. I’d like to start using this model, maybe on a new sub-system, I don’t actually want to change storages right now, I want to stay on the storage that I’m familiar with and only start using the programming model, writing new applications, that data is stored right alongside my other relational data and it’s backed up along with it and everything else”, and so I think that is a very smooth transition and I expect a lot of people to take advantage of it.

   

11. Did you feel like there is going to be a layer between applications and Datomic over time that solves a similar class of problems, or do you feel like that is inherently built into the model and the API that Datomic provides you?

Peter's full question:Nice. Now, firstly there was the relational database and we saw that it was easier than querying from arrays of tables, and that was great, and then we saw that the CRUD seems to be too much work so the object relational mapper is born. And then we got even now object document mapper trying to take a similar model to document stores like Couch and Mongo, did you feel like there is going to be a layer between applications and Datomic over time that solves a similar class of problems, or do you feel like that is inherently built into the model and the API that Datomic provides you?

It certainly, people always write wrappers they just love wrapper so I will never say that they wouldn’t do it, people will do that, but certainly the design of Datomic is meant to encourage people to not do that, and it should be a lot less of a need felt to do that kind of wrapping, because Datomic directly maps back to your native data structures, it has a very strong notion of an associative map that it can return which is usable directly quite transparently in most programming languages, and it has a data orientation that we think people would want to preserve in their applications, that should reduce the pressure for these mapping layers.

The other side of it is though, a lot of these layers exist because people are trying to manipulate data in their process and the only tool they have is their object oriented language. Part of what Datomic is trying to do is to deliver another tool that you can use inside your process, by giving you a query engine you can embed in your process, you can now start doing declarative data manipulation in your process. It used to be something that was only the purview of your database server, and I think that’s why you got this us versus them problem and now you are saying I have that power myself and I should use it.

   

12. One other thing, when I first saw Datomic something immediately came to mind just from the simplistic immutable data store was the idea of CQRS event sourcing, Greg Young's work, how do you see Datomic – do you see one use case being as a store for an event sourcing kind of model?

This certainly is a sense in which Datomic subsumes that whole thing, that’s sort of an architectural pattern and paradigm and I wouldn’t say it’s co-aligned with it, but Datomic definitely covers a lot of the same ground, I think it does it quite strongly, especially in a lot of areas, for instance the ability to recover a past point without replaying it all, and by concretly storing the results of transactions and by actually affording an uniform model so one of the things about CQRS, the two model thing, I am not sure you need two models if you have one really good model. And so we tried to unify the model of change and query, so that it puts it all together, many people have seen the similarities, and I think people should be careful not to say, not to make an equivalence of it because while similar, they are distinct but they both care about keeping track of everything that has happened, so in that respect a hundred percent on board, that’s a great idea, everybody should do it.

   

14. Got it. So right now it’s a persistent store for JVM languages and worst case you create a facade in a JVM language, spin that up and expose it.

Rich: We will do that, hopefully better than you would for yourself. And keep people from reinventing that wheel so that’s coming.

Peter: It is that - and I hate to push you on deadlines – but do you feel like that is coming in a few months or?

Rich: Yes

   

15. If you were advising a Java developer which to start with, what are some of the heuristics or rules of thumb that you would say it would better make sense to them?

Peter's full question: Ok, great. Anything else? When you are trying to decide a lot of engineers don’t have a lot of experience with either functional programming or tools like Datomic. If you were advising a Java developer which to start with, what are some of the heuristics or rules of thumb that you would say it would better make sense to them to start digging into Clojure versus saying "Let me keep my Java code and let me see how I can hook Datomic up as a backend", which immutability would you start with in what kinds of cases?

I think that changing a programming language is a much bigger deal than changing your database, I mean obviously moving an existing system would be huge, but if you are actually doing a new system, adopting a new database is more straight forward than adopting a new programming language. I never sold anybody on Clojure and I am not going to start now, I think if people have heard about it and they are interested in it, I think they definitely should try it, if they are comfortable with Java I think there are huge benefits to moving to a data store like Datomic for all languages and so I would definitely advocate that first.

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT