Steve Vinoski and Bob Ippolito on Async I/O in Python and Node.js, Web Development in Erlang
Bio Bob Ippolito is CTO and cofounder of Mochi Media, experienced Python and Erlang user and frequent open source contributor. Steve Vinoski is an architect at Basho Technologies in Cambridge, MA, USA. He's worked on distributed systems and middleware systems, and writes "The Functional Web" column for IEEE Internet Computing in which he explores the use of functional programming languages for web.
The Erlang Factory is an event that focuses on Erlang - the computer language that was designed to support distributed, fault-tolerant, soft-realtime applications with requirements for high availability and high concurrency. The main part of the Factory is the conference - a two-day collection of focused subject tracks with an enormous opportunity to meet the best minds in Erlang and network with experts in all its uses and applications.
Steve Vinoski: I have worked in distributed systems for longer than I care to admit, 20 years or so, did a lot of C++ programming for years, some Java and then found Erlang. It was quite a discovery because I'd been trying to build systems like that for years and about 5 years ago I started looking at it and just really fell in love with it. So I have been doing Erlang programming in multimedia delivery for the past 4 years and I am starting at Basho next week, so I am working on Riak, staying in the Erlang community.
Bob Ippolito: I am Bob Ippolito, CTO and cofounder of Mochi Media. I have been a long time programmer since I was about 9, but professionally I had used primarily Python. But about 6 years ago I found Erlang when I was trying to build scalable and distributed systems and I fell in love with it. It’s great at what it does and almost all of our servers are written in Erlang.
Steve Vinoski: I’ve looked at Haskell but I never really tried to use it, I've been reading Brian O’Sullivan’s book: "Real World Haskell" and haven’t quite made it through there yet, but I think for me, I‘d done a lot of work in C++ but I’d always done a lot of Perl and Python and some Ruby. At my old old company, IONA Technologies, a middleware company, we had a lot of C++ code, a lot of Java code and I am looking at some of the examples we’d give to our costumers and they would be 70-100 lines long just to do something really simple. So I started looking at Ruby in particular to try to see if we could use dynamic languages to kind of cut down the lines of code and make things easier for the costumers to understand and I found you could get about a 10x reduction in lines of code and the lack of type checking really wasn’t an issue.
Then, when I found Erlang it was, like as Bob said, it's really good at what it does distributed systems and the concurrency and all that, and we had a lot of that in our systems, middleware systems. And so it just seemed like such a natural fit I never really thought about the type checking as much because of the work I’ve done in Ruby, Python, Pearl. I know some people are adamant about it, but I just like things that work.
Steve Vinoski: Yes Functional Web and it’s for IEEE Internet Computing Magazine.
They prefer the very straightforward serial, threaded approach. And Erlang gives you that approach, but it does it in a scalable way, you are not allocating megabytes of thread stacks for each socket. You are only allocating a few hundred words. So basically it’s the best of both worlds, but of course there is a paradigm shift with the functional programming language versus Python.
6. I think Python is sometimes called a LISP with a syntax. Is that true or I am misremembering that? It’s also considered somewhat of a functional language because there are lambdas and things like that.
Bob Ippolito: I haven’t really heard that much recently; there are a lot of functional elements to Python. The built-in libraries are mostly composed of functions, or the built-in functions are mostly functional, like there is map and what not, whereas in another language like Ruby for example it exposes some objects that have class methods that you would call. So there are definitely functional elements to Python and you can program in a functional style, but I think that most people program in a sort of imperative object oriented style whereas the functional style is more maybe how you might implement the method, but you still have the classes and methods involved.
7. In your column Functional Web, one of the good aspects of Erlang is the crash-proof programming, multiple processes. Is that something that you see in other languages, have other languages picked it up in some way?
Steve Vinoski: They see it in Erlang and they try to emulate it, I think most languages when you are running a website you have multiple servers anyway and the element of having always available services is just kind of built into the domain, if you will, people know they have to keep these websites up so they go to whatever length they need to keep them up. And so you are looking at load balancers, multiple servers and machines, multiple processes per machine, so it’s not as big a deal I would say in a web server world when you are talking about what language you are using because you know you have to keep the thing up anyway.
8. I guess you also have to think in the similar way to the supervisor trees in Erlang where you have to make sure that, if something crashes, you have to start it up. It’s not like in Java where you sort of cross your fingers and hope that nothing crashes.
Steve Vinoski: People use various operating system capabilities to make things restart on demand and have processes that watch other processes and that sort of approach and that is very similar to what Erlang gives you out of the box.
Bob Ippolito: It’s sort of at a different level. WSGI is more about abstracting how you call Python code and return a result whereas MochiWeb is more concerned with the actual HTTP layer it manages, basically a web server and gives you a very sort of thin veneer to talk to that socket once the web server is talking to the web browser.
Bob Ippolito: The first application we built with MochiWeb was we wrote our Python web service that collects analytics from Flash Games, the second one was our ad server that serves ads into Flash Games. Since then we’ve got a whole number of other services, most of them speak to Flash clients, using binary protocols which is something Erlang is great at, but HTTP is really convenient for talking to anything on the internet, so that is why it’s a web server and not anything else.
Bob Ippolito: I would put MochiWeb more in the, I don’t even know what web servers are named in Ruby, maybe like Webrick, I am not a big Ruby user. So it’s really just sort of the lowest level component that Rails is going to use. It doesn’t provide the rest. The only comparison I would make is that MochiWeb also ships with a nice script that allows you to create a new application that depends on Mochi Web, which is one of the things that I guess Rails innovated was being able to create a project from a template, so that people instead of staring on a blank folder, they would be staring at some source code files they could just make small edits to. I think that is the only real similarity.
It doesn’t provide any sort of database abstractions, no templating libraries, so none of the conveniences that Rails gives you, but you certainly could built something more Rails-like on top, like I believe Nitrogen might be a better example which is a web server framework that uses MochiWeb internally.
Bob Ippolito: Exactly right. I believe so. I haven’t personally used Nitrogen myself, I’ve only looked at it sort of in passing, so I can’t speak to exactly what features it has or doesn’t have.
Steve Vinoski: In terms of Yaws and MochiWeb I think some people use one, some people use the other. I think, and Bob can correct me if I am wrong, but Bob needed more of an embedded server because Bob had used Yaws in the beginning I think and needed more of an embedded server, so instead of having a standalone server with some its code stuffed inside he wanted his code to have a server inside. And at the time Yaws had some capability but it was a little clunky back there that was before I joined the team. So I think Yaws has a ton of features that MochiWeb probably doesn’t have, just because of that difference of the embedded focus vs more of a standalone focus.
Steve Vinoski: Yaws was started by Claes Wikström, he goes by the name Claka, and this is 2001 he was going to build a web site for allowing floor ball players to register, say sign up: "I’ll be there on Thursday night" or whatever. So he started with Apache the and the LAMP stack, basically and in his own words he was horrified, so he started building Yaws instead. He’d done a lot of Erlang to that point, I mean he invented parts of Erlang like the bit syntax and distributed Erlang and Mnesia and things that we use every day. So he was very familiar with Erlang and he had never done any web programming with it before, so he started building Yaws. The first commits were in January 2002 and he never finished the floor ball site though, he finished Yaws instead.
Steve Vinoski: Yes, right. And I was looking for a web server in 2007, I needed something that could really scale well. And I was doing some work on set-top boxes at the time, so the plan was to have thousands of set-top boxes coming into a particular machine and I was looking for something that could do, say like 30000 connections relatively easily and did some testing and there is this famous Apache versus Yaws graph that shows it going to 80000 connections. I was able to get to 30000 reasonably so I chose Yaws at that time and when I started working with that I found a couple of things and send some patches to Claka I think in 2008 he made me a committer.
Steve Vinoski: A lot of the stuff is built into the VM itself. In my talk that I gave here at Erlang Factory I said that writing web servers in Erlang is actually pretty straightforward, the HTTP packet decoding is built into the VM, so you could set up your socket and say: "I am expecting HTTP packets" and it will give you the headers and things as data structure. So that’s taken care for you the polling of the socket and all that is in the VM so you are really just getting these messages from the VM that say: "Here is an incoming HTTP request." Yaws has a process pool that it keeps so it has little cache of Erlang processes that are listening or accepting incoming connections.
Steve Vinoski: No, Erlang processes, it’s just one OS process. So I think that helps even though creating a process in Erlang is extremely fast, we just keep a small pool handy and they just pick up and accept. But yes, a lot of it is really due to the VM itself.
Bob Ippolito: In most of our uses of MochiWeb we're speaking with Flash clients or in many cases we’re actually dynamically generating various binary formats that Flash uses, in some cases the Flash bytecode itself. And Erlang due to the way it allows you to send IO lists and the way that it has this binary syntax that allows you to very easily disassemble and assemble even the strangest binary structures was really helpful to us. I had some prototype Python code that was maybe 3 or 4 times as long and much less straightforward. And the SWF format is as bad as you can get with variable bit length fields here and there and some places Little Endian and some places Big Endian and Erlang just chews right through that stuff and other than that we have a lot of custom in-memory databases and various other servers have taken advantage of Erland distribution, concurrency and ETS tables basically almost everything Erlang has offered we’re using somewhere in one of our servers.
Steve Vinoski: Yes.
Bob Ippolito: Maybe.
Bob Ippolito: I think it’s a very interesting project. I am not entirely sure that it’s ready to run all of our production software right now, especially with the rapid adoption of NIFs by many of the projects that we use, the native interface function, basically C code which I think would be a lot more cumbersome to integrate with Erjang or if not cumbersome then slow to go through the JNI to call these functions.
Bob Ippolito: In some cases. These are custom written C functions that are loaded into the Erlang VM.
Bob Ippolito: Basically they can be used like drivers or they can be used just as sort of a replacement of something that would be slower to do in Erlang like calculating a hash or a checksum those kind of byte based numerics are often much faster to do in C.
Steve Vinoski: I originally wrote it, it's SHA-2, so it’s the 256-384-512, functions and I originally wrote it I think just on a whim, I just saw some questions on a list that someone was looking for these and said: "I could do that in Erlang". So I wrote it initially in pure Erlang and it worked and some people used it, but it’s slow, as Bob said, some of those kinds of calculating functions are just slow in Erlang, people know that. So recently I rewrote it in C, using NIFs and it’s quite a bit faster. If you run the test suite for the functions I think it takes about 30 seconds on the original code and less than a second on the NIF .code.
Steve Vinoski: Yes, definitely, it’s C code, so you can be as dangerous as you like and if you are too dangerous you are going to crash the VM and that’s what makes it hard. NIFs are a lot easier than drivers. Drivers have a particular kind of an entry table that you have to fill in certain functions and the VM is going to call you back. A NIF just looks like an Erlang function from the outside and it happens to go into the C code. So you can have threads and all kinds of stuff in NIFs as well, but it’s generally not used that way and for me personally when I have to write things that need threads I’d go the driver route.
Steve Vinoski: There is an API. There is types you have to use to represent Erlang terms within the C code and then there are functions that let you operate on the terms, create the terms and there are functions that help you with allocating memory and those sorts of things. It’s a really nice API, the NIFs are pretty simple to write, so I think, as Bob said, you are starting to see more people doing that for certain elements of their computation or the problem they are trying to solve just because you can go faster sometimes or they need to integrate with something that is written in C. For example I had to write a UUID NIF just to call the UUID library on LINUX for one of my projects, so it’s very simple.
Steve Vinoski: Right, it’s very fast.
Bob Ippolito: Frequently or with a large amount of data, you don’t always want to send megabytes over a socket when you don’t have to.
Bob Ippolito: I believe you get direct references to the terms, so it’s basically zero copy. You may be allocating data on your way out, but on the way in there should be no copies.
Bob Ippolito: I believe that anything is possible in C, but it’s certainly discouraged.
Bob Ippolito: Thank you.
Steve Vinoski: Thank you.