BT

Gregory Collins on High Performance Web Apps with Snap and Haskell
Recorded at:

Interview with Gregory Collins by Sadek Drobi on Jun 17, 2011 | NOTICE: The next QCon is in New York Jun 9-13, Join us!
16:43

Bio Gregory Collins is a Canadian software engineer living in Zürich, Switzerland. He works at Google Zürich in the Site Reliability Engineering team. Gregory holds a Master's degree in Computer Science from Yale University. He's involved in the Snap project (http://snapframework.com/), a Haskell-based web framework.

QCon is a conference that is organized by the community, for the community.The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community.QCon is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers.

   

1. Gregory, can you present yourself?

Yes. My name is Greg Collins. I am a site reliability engineer at Google Zurich in Switzerland and today I am here to talk about the Snap framework which is a web server and web programming library written in Haskell.

   

2. It’s interesting that your web framework is written in Haskell, there are not a lot of web frameworks on top of Haskell, can you tell us a bit about the framework and its architecture?

Snap is a library that you link with to produce a program that is a web server. Haskell is a natively compiled programming language, it's native code, it links, it produces object code that runs on the bare machine, bare metal. So this model allows us to write web programs in a really high level way because Haskell is a high level language, but that gets compiled down to web servers which are very fast. Haskell is competitive with more lower level languages like Java, C, in terms of performance, we also get the advantage of having a very high level of expressiveness in the code.

   

3. So why it is performant?

The Haskell compiler that everyone uses is called GHC (Glasgow Haskell Compiler) and it’s the product of 20 years of research into how to make high level programs and constructs compile down efficiently into really fast native code, so really just like piggy backing on the work that the team at Microsoft have done. Part of the appeal of Haskell for this kind of work is that it is a very high level language, but it has a very nice and simple foreign function interface to C. You can call C functions from within Haskell really easily. So native code things like talking to the operating system using native system calls to do IO and that kind of thing are actually really easy in Haskell and there is not a lot of goop between you and, say, the Linux Kernel, when you want to do IO. And this is one of the reasons that the Snap framework can be fast.

   

4. Can you explain how does Snap interact with HTTP?

Snap uses an IO model called Iteratee IO which is an inversion of control; In Java land you talk about a socket, you want to push bytes into it, you want to take bytes out of it, it’s explicit read and write calls. And the thing that is in control in that context is the function that is doing the pulling of the IO. Iteratees invert this idea by you give the consuming function to the Iteratee or to the enumerator that is taking the bytes out of the socket and it handles all of the buffer management and just gives you chunks of data and the consuming function turns out to be a state machine. If you get a bunch of data you can do three things with it, you could error - we won't talk about that; you can either say: " I am done, I am finished processing the stream and I am going to give you a value."

Or you can say: " You didn’t give me enough data, I need more to be able to decide what to do with this." So from an HTTP stand point is we have what is called an enumerator which is just a simple little function which pulls data out of the socket and hands it off downstream and what the Iteratee on the other end of it, so there are two halfs of it, there are enumerators and Iteratees. Enumerators feed data to Iteratees which consume it and so the reason we do this this way, it's backwards from the way that you would normally think about it, is that these things can compose with each other. If I have an Iteratee that reads data from a socket, I can wrap it in another Iteratee that does things like chunked transfer decoding or on the other end if I have an enumerator which is producing data to a socket, I can wrap the socket in something that will take the data given to it and produce chunked transfer decoding.

So you can just plug these things together like Lego on a really high level, but underneath you have efficient buffer management, you maintain protocol invariance, you have streaming, in constant space, all of these things that are transparent to the end user and it’s all very efficient.

   

5. So the idea that you don’t have to care about any of these, like buffer management, it’s more declarative?

Yes, it can be more declarative. There are instances where to do things that are a little unusual or out of the box, you do sometimes have to think about the underlying IO model, but in a lot of cases you do that once, and then you have a reusable sort of thing that you can plug in. The thing with Iteratees is they compose. If I have a thing that consumes an HTTP header, let’s say, and then I have another thing which consumes the body I can chain them together like this and then I have another function or an Iteratee which consumes the header and consumes the body. So just like functions they chain together, they compose, you can wrap them, they have all the same sorts of composability properties that Haskell functions have.

   

6. How does Snap framework compare to the other frameworks out there, like Ruby on Rails, Django and the Java frameworks like the Play framework. or other frameworks?

The most critical and obvious difference is that Snap is about 6 months old or maybe a little bit more now, and we have not reached kind of the same level of feature completeness as any of those. On the other hand I think what we do offer is, if you want to compare it with something like Ruby, I think Snap at this point today is in a niche where you could use it in places where the lack of performance from Ruby would be a real problem. Haskell, I would say, is probably on a CPU basis and I know a lot of web problems are IO bound, but on a CPU basis is probably 40 or 50 times faster than Ruby, because we compile it down to native code, it’s almost as fast as C. If you have something in your web infrastructure which is a real hotspot that doesn’t scale under one of these other frameworks like something in Python or Ruby or whatever, I think Haskell is a really good language to introduce in that kind of place where before few years ago you would be kind of forced to turn to Java or C++, to get performance in your backend.

And I think because Haskell is so expressive it’s a really sort of attractive alternative to those kinds of languages for this niche, because one property of Haskell programs tends to be that you can solve a really complicated problem in a Haskell program, if you choose the right abstraction, you can produce something which is really actually simple and easy to understand. That is just a property of the way the language composes with itself.

   

7. So other than compatibility what are the other things that I can benefit from by using Snap framework and Haskell in terms of productivity?

Ignoring runtime efficiency which we talked about I would say if you are not already interested in Haskell as a language itself it is not much we can offer you which is compelling in and of itself. The framework is still really or the project is still really young and for instance we just implemented HTTP file upload support in the last version released last month. So if you are looking for something to give you an overall productivity boost we aren’t there yet. Within a couple of years I think the Haskell community as a whole will be in a place where you could conceivably do 50-90% of your web programming in Haskell. The main thing we offer to people is that Haskell is just a really nice language to work in.

The fact that side effecting IO is in a little jail here that the whole universe of pure functions can’t affect or vice-versa, so if you have pure function, given the same inputs it always produces the same outputs and that allows you to isolate or put into a sandbox or a jail the hairy parts, the mutable messy parts of your program into a little kernel, or you can sort of keep it under control and the rest of it you don’t have to do much with at all because a pure function parallelizes without any problems whatsoever, pure functions can never deadlock, they can never interfere with each other in a concurrency context.

   

8. People like, for instance, Ruby and Python frameworks, PHP frameworks, because they modify their code and hit and refresh the browser and that is it and it loads and Haskell is a compiled language right?

But Haskell also has an interpreter. The GHC Haskell compiler ships with an API that you can link to and in fact in Snap 0.3 and 0.4 we have implemented a mode where you can compile your application in a mode where it will interpret your web handlers and get exactly the behavior you were saying, where you can edit the code, hit refresh and then see it right away. You wouldn’t want to use that in a production context of course because interpreting the code involves a lot of overhead, but I think this is kind of neat feature, and this is true in a lot of cases for Haskell code, we get to stand in a sort of a midway point between two places where we get really fast compiled code and you can also interpret and get dynamic behavior.

   

9. When you use any frameworks, usually open source frameworks, you think about the community and the community size. So what is the community size of Haskell and Snap?

Snap is pretty small, we have, I would say, a dozen to 20 people who have contributed code to the project, but the people who are really writing most of the code are fewer than 8 I think. And people are coming in and out as they elect to do contributions. We have one guy, John Lenz, from the University of Illinois, who just showed up with a patch for SSL support like: "Here you go, here is your SSL support, for your server." So the community is very small and Haskell in general has a pretty small community. For better or for worse, in a lot of circles it’s considered to be esoteric, although this perception is changing, but I think as a community it punches way above its weight.

I think the average Haskell person, the person who is really interested in writing Haskell code and doing stuff tends to latch on to the idea and might be really excited about it, in my opinion this is the way I feel about it. Once you’ve seen a language like Haskell it’s very difficult to go back to programming in Java or C or whatever, because the things it offers you in terms of expressability, the information content, information theoretical perspective, like the amount of entropy contained in an individual character of Haskell code is so much higher than in a lot of other languages. Ruby has this property which is why a lot of people tend to like it, but that coupled with the fact that actually at the end of the day you get a program which is really efficient I think appeals to a lot of people, people latch on to this idea and get really excited, so I think the community is small but it’s very active per capita.

   

10. What do you think about the next releases of the Snap framework?

The next thing up on our plate is some internal competition within the Haskell community in web frameworks which is really awesome and I think it’s served us really well so far. There is a project called Yesod mostly written by Michael Snoyman. His community has produced a web server called Warp. Snap is a very fast web server and these guys have produced something which, if you believe their benchmarks and they are claiming request per second numbers up into the mid hundreds so 150.000 requests/second on a Hello World benchmark. And we haven’t been able to duplicate those benchmarks, but in our testing there are probably 2-3 times faster than we are and that is awesome, it shows us that within the GHC runtime we are fast already and there is room for us to get faster.

One thing on the plate is continuing to improve in the efficiency stand point and the other of course is features. We have a long way to go before we can offer something to the web programming community at large which is comprehensive and unified. And we have some sort of ideas in the pipeline for how to present these things but one of the things that we really tend to do in our project is we think a lot about the APIs that we design because once you’ve written a bad API, you are kind of stuck with if for a long time, especially if you start to get traction and adoption from people. So right now that we have a server that can you can program on an interface sort of at the same level of abstraction as Java Servlets, we want to think about, like Rails for instance has this great property that you can write little bits of Rails code and glue it to other bits of Rails code and there you have a big project.

This is something that I think none of the Haskell frameworks have solved well yet and that is the next thing on our plate.

   

11. You are working at Google. Is Google using Haskell in any of their projects?

Not widely. There have been some teams that are using Haskell and have published that they have done so. There is one team the corporate virtualization team, they wrote a paper at ICFP last year written by a guy named: Iustin Pop and he has been using Haskell to manage virtual machines in our corporate network.

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2013 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT