BT

Timothy Baldridge on Clojure's Core.Async
Recorded at:

Interview with Timothy Baldridge by Werner Schuster on Feb 20, 2014 |
20:06

Bio Timothy Baldridge (@timbaldridge) is a developer with Cognitect Inc. He hails from the mountain regions of Denver Colorado (USA). He is a polyglot programmer with experience in Clojure, C#, Python, and Erlang. Most recently he was deeply involved in the development of Clojure's Core.Async library, where he designed, implemented, and maintains the state-machine code rewriting macro known as "go".

Code Mesh London is an annual conference dedicated to non-mainstream technologies. In 2013 it featured talks from over 50 inventors and experts in languages, libraries, operating systems and technologies that handle the programming and business challenges of today. Programming languages discussed ranged from technologies that have been around for a while such as Haskell, Clojure or Erlang to new languages such as Elixir, Rust, Go and Julia.

   

1. We are here at CodeMesh 2013 in London, I’m sitting here with Timothy Baldrige, so Timothy who are you?

I’m Tim Baldrige, I’m a developer with Cognitect, we are a Clojure and Ruby consulting company out of Durham, North Carolina in the US and I’m actually based out of Denver, Colorado which is even further west but I work remotely from there.

   

2. Cognitect is the company that houses Rich Hickey nowadays days?

Yes, we used to be Relevance, was the name of the company and then a few months ago we merged with Rich's company to form Cognitect, so we do consulting and we also do a lot of work on the Clojure compiler and libraries as well, we built and maintain Datomic which is a functional database as well.

   

3. Do you also work on Datomic or what is your main job?

I mostly do contracting work with companies, so I work with various clients that we have and doing pretty much purely Clojure work, so I do mostly the client side of it.

   

4. You worked on a project called core.async. Now everybody either loves or hates async, what did you do, what is core.async?

core.async is a library for doing what is called CSP or Communicating Sequential Processes style of programming in Clojure and CSP at the core is taking apart asynchronous programs into two things: processes and queues that the processes use to communicate with each other. These queues are actually of a fixed length and they can be blocking where if the queue fills up it will block the process from putting more data into the queue or they can also have some sort of dropping semantics to remove messages when it gets full, but that is kind of the model, is that you have these very lightweight independent processes that then communicate with channels.

So this makes it easy to do asynchronous programming because we can for instance build an abstraction over an asynchronous interface to a database, for instance it is really easy to do this, for instance with Cassandra or something where you are going to make a query and the query API returns a channel and then your process can just take a result from that channel and the process will block until the value taken from the channel arrives. So in a lot of ways that is like promises or maybe an async/await in C# but where a lot of those things are one shot, this is a continuous stream of data. So you may only take one item but you can take an infinite number of items from the channel if that is the semantic you want.

   

5. What does it actually look like when I use core.async, what do I write? Do I have to register handlers, what do I do?

At the core, the library very much is a channel, a channel is basically a queue, and you can attach a reader or a writer to it, for one end of the queues or the channel, and those are one-shot, so taking an item from the channel you give it a callback to execute when the value arrives and then once that happens, the callback is removed from the channel and you get one item for each callback attached. It's the same thing for put, you can have a callback for when the put succeeds. Now obviously that gets us into a state, we like to call CallBack Hell where your code is just callbacks all over the place. So core.async on top of that supplies some primitives for within a thread doing blocking operations, so you are going to put and block on the put. That is kind of done with promises and so it will actually block the actual Java thread until the put or the take succeeds.

But then we also wrote a go macro which is a lightweight process where it will take your code within the macro itself, rip it apart, rewrite it into a state machine and attach parts of that code block as callbacks to the channel where the I/O operations happen, so one of our goals when writing the library was to implement it purely as a library. There are other languages out there that do CSP and often it’s a language -level feature, so being a Lisp, having macros, being able to do pretty much whatever we thought, lets push this and see if we can actually implement the entire concept just with macros and the other primitives that we have, and it works pretty well. We designed it for Clojure 1.5 and 1.6 and I’ve heard it runs on 1.4, I’ve never tested it and it also runs on ClojureScript which is a Clojure on JavaScript and it’s very portable in that way.

   

6. These go blocks, they essentially bring the sequential back into the async , you write this sequentially and you have blocking semantics but underneath your threads still keep running, right?

That is correct, so what we like to call it, and it is kind of an unfortunate term if you come from the OO world, but the term is Inversion of Control and has nothing to do with what we're familiar with in the OO world, not dependency injection, but it's the idea of taking code that expects to call you and turn it into code that looks like you are calling it and so that is really what the go macro does, it takes portions of your code, turns them into call backs and as it does so it writes the local variables in the go to some storage, an array actually, and so when that callback completes it can load the variables back in. For those that are familiar with perhaps async/await in C#, how that works, it’s very, very close. When I wrote the go macro I pretty much read all the public information on those and took as much as I could from it.

Werner: You bring up C# async/await which, I guess, it's done in the language and they also do a sort of a rewrite to a state machine.

The two approaches are very, very similar, you turn the function into a class, you move local variables from being local variables to being fields on the class and then you normally have an entry point that says what state, so calling into the state machine you have normally an integer that says “What state do you want to jump to in the state machine?” and then each time you exit you increment that to the next state and now you kind of have the ability to jump anywhere within the state machine.

Werner: That is pretty hairy code to write yourself.

It is, currently I think the go macro is about 700 lines of Clojure code as a macro so yes, it’s pretty hairy but the nice thing about it is that you can map a lot of the Clojure semantics to that, so for instance we can catch an exception within a go block and then do an asynchronous operation in the catch or an asynchronous operation in a finally..., so this is one of my favorite parts of the go macro, is that you can throw an exception and maybe catch it a couple try blocks above where you're currently at, but doing an asynchronous operation in the finally block and that will execute, do your operation, once it completes the exceptions continues to propagate up until you hit the catch. So it really just allows you to throw a put or a take from a channel anywhere in the Go block that you want.

   

7. So you're reimplementing exceptions?

Yes, it’s kind of a state machine within a state machine and we have to track exceptions and how they're handled and then unwind them, but the nice thing is that code really only runs when there is an exception which is not very often. So there is a slight performance hit to all of this. Often I have found though that the amount of time spent doing whatever your asynchronous operation is, the I/O, the amount of time you wait on that, outweighs the performance penalty of writing in this form.

   

8. Is the performance penalty, does that comes from pushing the state machine forward and storing things?

Yes, a lot of that. As well as, that is primarily what it is, is that you have to also store locals in a way where if the go block is resumed and it’s a completely separated thread you have to make sure all that locals are flushed out to the CPU cache, and so they are basically volatiles, what it's called in Java, that does slow things a little bit as well. So there is a lot of these little things that have to be done but it generally works really well.

   

9. But it’s something that you do for I/O and I/O is pretty slow anyway, it's not going to be CPU bound for this. So since we already talking about I/O being slow, so what do you support in these go blocks right now, what I/O, do you support anything, can I write my drivers with it, my MySQL drivers, what’s the story there?

There is been only three operations we support, I/O-wise inside a go block that will cause it to be attached as a callback to this thing. Those three things are a put, a take and basically we call it an alt but it's like a select operation basically saying we want to take from these three channels, put into this channel and whichever one succeeds first continue giving you the values that you either took or put or so on so forth, so this is a construct that most CSP languages have, the Go language, I believe, has this sort of thing. So these are the only three operations. In reality this is all you really need, so I mentioned before that you can use their put and take functions they're are called, are actually functions that given a channel and a value to put into that channel and a callback to call after it succeeds, you can interface with channels in a way that is more a standard way of interfacing with a channel and that you are going to use functions to put into it and call this callback.

That sort of thing maps very well to the libraries that you’ll use to communicate with the outside world, so as an example Clojure has a library called http-kit and this is a library that does asynchronous HTTP operations. That code provides you a function called 'get' which does an HTTP GET, you give it a URL and you give it a callback that will be called that once the GET succeeds that callback receives a response object or then you can put that response object on to a channel, return the channel from this function that you wrap this all up in and now you have a URL-to-channel constructor basically. So that is really the pattern that we use, we try not to introduce new ways of doing I/O inside a go block instead we just have everything return a channel and then anything that accepts a channel to work with, can work with these primitives.

   

10. core.async has been around for a while now, a short while, a long while on the internet. People have tried it and some people had comments about certain behaviors, so in go blocks I think you have to write all your functions, all your code inside the go block directly and you can’t call, you can factor out a function, so what is your take on that?

So the basic idea is that this transformation, the state machine transformation is only within the go block itself, it doesn’t descend into the functions that it calls and that sort of thing. There is two reasons for this: one is purely practical and macros do not really support transforming your program as a whole, they are only given the expression within the body of the macro. So we would have to probably drop to a compiler level for doing that sort of transformation. On the other hand that sort of transformation begins a kind of infectious cycle where if you have five functions and then way deep inside those functions you do a put or a take on a channel, and you want that to be run in the state machine model and now you have to transform all those functions.

So it’s very hard to determine what functions could be called, especially in a functional language where you are passing functions around as parameters and you never really know what function you really need to translate and which ones you don’t. The other way I like to look at this and the other criticism I’ve heard is that, putting and taking from these channels is kind of side effecting, you are modifying something. Now in CSP they have ways of kind of explaining how this is not much of a problem, I don’t think that is a problem, but if we assume that to be a problem, this sort of behavior with the go kind of helps, because it forces us to do I/O at the ends of our modules, so we can build our modules as purely functional modules that take an input message and put a message somewhere else, and the core could be functional with really no state and that seems to work pretty well. So we are trying to encourage people to write in that way and think of ways of maybe restructuring their code so you don’t have to do a put or take 5 layers down in a function callstack.

Werner: You mentioned that core.async also runs on ClojureScript, so one question there is you mention that you have blocking puts and gets in the core.async implementation, so how do you do that in JavaScript, you don’t have any threads, how can you block, you block and that's it.

So we have two forms of put and take, actually three but we will talk about the two right now. One is a blocking put and take that blocks the actual thread itself, that is not supported in ClojureScript. In Clojure we can use those blocking put and blocking take anywhere in any thread, but the other form is what we call parking put and take which is only allowed within a go block and works in the transformations and attaches a callback to a channel. The semantics of that looks very close to what we were used to see when we do any sort of asynchronous operations in JavaScript. And so we have a little, basically we have a system when a value is put into a channel, the callback that needs to be called, is shoved into a queue of pending callbacks and then every once in a while, there is a timeout that will run and start running those call backs.

Most of the time that happens almost instantaneously, when you are building a whole application in ClojureScript normally these little callbacks or states that are being running in the go blocks are very small and so they just kind of queue up and run as the main thread in JavaScript frees up. And that allows us to do some really cool stuff, for instance you could have, traditionally I’m familiar with like when you run an animation function in JavaScript, you have to go and have some sort of state, like a state machine basically and say: “Ok, I’m going to wait for ten milliseconds and then decrease the opacity by one, do this over and over until the opacity is zero and then I’m going to delete this div from the DOM”.

That is pretty hard to write because each time you want to wait for certain amount of time, you have to attach a callback on to the timeout and setTimeout over and over again. We have what we call timeout channels in core.async which are channels that will return nil after a certain amount of time and so it’s very easy to say: “All right, decrease the opacity by one, take from a timeout channel of 5 -10 milliseconds, do this in a loop until the opacity is zero, now delete the div” and this is a single block of imperative looking code that is very easy to understand, what is doing is very apparent when you first look at it, yet behind the scenes it's written in the same way you'd write the other code.

   

11. What is this story with sharing code between Clojure and ClojureScript here, I mean what do you recommend people use the parking block put and get or do you want them to specialize code for Clojure or ClojureScript?

So what I normally recommend to people is that you should use go blocks whenever you are doing CPU-bound tasks and use what we call the thread block which uses an actual Java thread dedicated to that block of code, use those for more I/O-bound operations. If you are working with asynchronous libraries you may not even need to use that. So the importing situation is really just a matter of making sure that the I/O parts of your program are similar enough, that is often a fun problem to solve in this sort of thing because Clojure doesn’t really wrap a lot of I/O libraries and stuff, and you have to have some sort of interface to I/O that looks similar, but aside from that it just works. A colleague of mine was working on some code that he wrote in Clojure, tested it in Clojure, had written it with the idea of it running on ClojureScript but had never done that. So it was all tested and working and then he just recompiled it in ClojureScript and it worked the first time, which is a great story I think, but there are always little platform differences that you have to be aware of but that comes with the territory of any, porting any code.

   

12. I think if our audience has not already done so, definitely check out core.async because callback hell is horrible, that's why it's called “hell”, so to wrap up Timothy Baldrige what is your favorite Monad?

My favorite Monad is the Monad that gets the job done, I have a very pragmatic view of Monads and I will use them if they will make my life easier otherwise no, I won’t. So I use a state Monad a lot, I tend to use the state Monad but that is probably my favorite.

Werner: So it’s the Get-It-Done Monad or the state Monad in a pinch

The “write it for me” Monad, that is what I would like but doesn’t exist yet.

Werner: ”Do what I mean” Monad. Well thank you very much!

Thank you!

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2013 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT