Bio Francesco Cesarini is the founder and CTO of Erlang Training and Consulting. He has used Erlang for 15 years, having started his career as an intern at Ericsson's computer science lab with the inventors of Erlang. Simon Thompson is Professor of Logic and Computation at University of Kent. He has written several books on functional programming an is co-author with Cesarini of Erlang Programming.
The Erlang Factory is an event that focuses on Erlang - the computer language that was designed to support distributed, fault-tolerant, soft-realtime applications with requirements for high availability and high concurrency. The main part of the Factory is the conference - a two-day collection of focused subject tracks with an enormous opportunity to meet the best minds in Erlang and network with experts in all its uses and applications.
Francesco Cesarini: I am Francesco. Lately I have been programming Erlang and I’ve been doing it for 15 years. We expanded Erlang Solutions, we’ve doubled in size in the past year and expect to double in size next year as well.
Simon Thompson: I’ve been here this week, I’ve been teaching a course for beginners in Erlang, so that’s very fresh in my mind, thinking about what it is that people get about Erlang and what it is that people find more and more difficult. I have been talking about Wrangler, which is a refactoring tool we’ve been building for Erlang. And I have also been talking to people at the conference about using that and also about testing and what we’ve been doing on the ProTest European project.
2. You both actually do training and you are also a co-author of an Erlang book. So for newbies in Erlang there are lots of kind of wrong assumptions and misconceptions about the language when they get to start programming. For instance can you tell us about some of these misconceptions about the language with newcomers?
Simon Thompson: I think what you see is that there are things in the language that are missing. People are used to doing computing by changing the values of variables, if they are doing C or Java programming, so Erlang is different in that respect. There are no control constructs, where is the for loop, where is the while loop? So I think that is one of the things that first floors people.
Francesco Cesarini: It’s just getting used to a slightly different paradigm and it all depends on their background. There are usually three hurdles associated to understanding Erlang; the first one comes with pattern matching and using pattern matching correctly. You will see someone coming from a C or a Java background, instead of using pattern matching they will use if statements. The second is recursion.
Francesco Cesarini: They are using readability, concise code and I think the beauty of functional programming, pattern matching originally was influenced by ML and logical programming paradigm which Erlang adopted.
Simon Thompson: One of the places where I think it’s most tricky is just a concrete syntax for lists. I don’t want to go into the details here, but Erlang inherited the concrete syntax that Prolog had, which is slightly confusing in that sometimes something in square brackets is a list, other times you can refer to a list without putting it in square brackets and there is some cognitive dissonance there. I’ve certainly seen that with one or two of the students this week: they weren’t sure whether to put a list inside square brackets or not and that is confusing particularly because Erlang isn’t strongly typed. So in a language like Haskell, if you get the type of your arguments wrong or the type of the pattern wrong, you are more likely to get a type error, whereas with Erlang what is more likely is that you will get behavior that you don’t quite understand.
Francesco Cesarini: Another thing which pattern matching often leads to is defensive programming which should be avoided in Erlang. If you come to an inconsistent state or you get corrupt data, you should make a program crash.
Francesco Cesarini: Exactly, let’s give an example. Assuming you are mapping the digits 1 to 7 to the days of the week Monday to Sunday, if you get a digit number 8 instead of returning your error "Unknown day", you actually don’t want to return anything, you just want to make that process terminate. And people have a hard time letting go of this and instead they are used to trapping exceptions and then trying to fix it, but if you get a date which doesn’t exist, how do you fix it? You can’t. So let your process crash and let someone else deal with it. There will be another part of the program which will uniformly handle all of your corrupt data and all of your bugs in the software.
Simon Thompson: What is interesting about that is that it’s an architectural assumption and it’s quite hard to teach that approach to somebody who is a newcomer to programming. But what is interesting is somebody who is a newcomer to Erlang but is an experienced programmer, might not find that so difficult, because they are used to thinking of a small piece of functionality fitting into a wider picture. So being able to say: "We will build this in a compositional way where we treat successful behavior in one way and erroneous in another", people say: "Yes, I get that." So it’s easy to get across in the right context.
Francesco Cesarini: And I think one of the studies which clearly shows the benefits of not having defensive programming was done at Heriot-Watt University. They rewrote some C++ applications into Erlang and depending on how your count is 4-20 times reduction in code and the real interesting part is when you went in and counted and looked at what individual lines of code did, about 17-18% of the C++ code was defensive programming in error handling versus 1% in Erlang. So you hear the Erlang programs were 4-10 times smaller in size. That is one of the reasons.
Simon Thompson: I guess another reason for that is the fact that again you are making architectural assumptions, so there are libraries, generic behaviors for supervision, so there is infrastructure that handles that. So there is room to be skeptical about 20 times reduction.
Francesco Cesarini: It depends all on how you count it. I mean that is why we say 4-20 times. That is why the first experiments we were doing with Erlang got similar numbers, but I think they went down and said a 4 times reduction in code and is sounded incredible and no one else would have believed them otherwise. And I think there is one more area which beginners struggle with and that is concurrency and being able to actually start thinking concurrently and just thinking in terms of one event at a time.
5. People see things like the multicore challenge and that is why they go to the Erlang world, because they expect their programs to be more performant and more reliable. But then actually concurrency is a whole paradigm and even if you have one core it’s still interesting to use and it’s still a way of thinking about code and often people confuse concurrency with parallelism or multicore.
Francesco Cesarini: The whole idea with multicore in Erlang is to hide it from the programmer. A programmer should program without being aware on what type of architecture and processor the system is going to run on and ideally then the program will actually scale. So most of the time you shouldn’t be aware of multicore. Sometimes you have to for optimizations, but most of the time you can go ahead without it.
Simon Thompson: Just jumping back a bit to one of the things I just alluded to in passing is this lack of control structures and what is important to get across is how important recursion is in providing the sort of looping behavior. And I think there are two parts to the explanation there. One is showing people how a recursive function, particularly a tail recursive function is formally very like a function with a jump at the end, jumping back to the beginning of the code, perhaps with a modification in the parameters of the call, so modifying some state data. So I think you can explain things intellectually and people will see that, but then getting people to take problems they understand and solve them in that way.
So there are two stages, so one guy, after they were coming in on our second morning of the course this week, said: "I did those exercises last night and now I get it." So I’d explain it and he’d gone through one example himself, but he did some more and it’s just internalizing that pattern.
Francesco Cesarini: Same experience here. You go in, you explain recursion on a white board, you step through all the steps and there are 4 or 5 different patterns of recursion and once you know these 4 or 5 patterns you are fine and everyone sits and nods, understands or goes through it on paper. Then you give them the exercises and that is where people will blank out because they don’t have the knowledge on what pattern to apply to different problems, but it just comes very quickly and once you’ve gotten over that it feels like a natural way of dealing with things.
Simon Thompson: I think my experience of teaching the concurrent part of the language is that does come very easily, the idea that you have these separate flows of control, which don’t share any data, and you can see how those map onto separate processes within the virtual machine, with separate garbage collection and so on. People can understand very quickly how that is a very attractive model. The combination of that, this process model together with a linking model to deal with termination and so on it’s a very nicely designed language. The way that the mailboxes work, the way that linking works are not accidental, they all fit together very well and I think it’s very easy to design concurrent languages that don’t work and Erlang really does win on that.
Francesco Cesarini: I mean I think it just shows the iterative approach they took when they were inventing Erlang. They tried constructs out: if they helped reduce the code while solving the problem, they kept it, if they had no benefits, they removed it and as a result, Erlang has very few constructs, is actually not that much to learn, meaning people can actually learn it relatively quickly. Back in the Ericsson days, after attending a 4-day course people were programming phone switches where you can actually call from one phone to another and that is coming in with no knowledge of Erlang at all.
Simon Thompson: What is interesting, as well, about that is there was a very clear application area and that was tensioning all the design and you contrast that with other languages where it’s a much more academic exercise. So I think the fact that there was specific performance and reliability and fault tolerance, requirements coming from the particular application area meant that it made the designers work much harder.
Francesco Cesarini: And you are referring to telecoms, but actually if you look at it today those requirements are valid in switching and banking, financial switches, in messaging services; they are valid in a lot of other areas, not just telecoms and that is why Erlang is spreading the way it is.
6. Let’s imagine there is someone that doesn’t know at all about Erlang or the actors model. Can you just give me an image or a view of what will I find in Erlang, all the components, how actors work with the mailboxes and then the supervision control and how do I program inside each actor and all of these things so that I can get a global image of the whole technology?
Francesco Cesarini: You’ve got processes that will not share memory, they will communicate with each other through message passing. So when you send a message from one process to another, you actually copy the data from the heap of one process to the stack of the other. They get a place in the process mailbox and they get read through selective receives, so you go in and you try to pattern-match the first message in the message queue. If it doesn’t match you will try the second and you continue, keeping the ones which don’t match in the queue. So that is your basic constructs. You then go out and you provide what we call "behaviors" to these processes.
Francesco Cesarini: That is right.
Francesco Cesarini: "Selective receives" means that you can go into the mailbox and pick out only the message you want to receive and you leave all the other messages in there. This really comes at hand when you are building, for example, finite-state machines, which can be really complex. If you think of telecom systems, often you will receive messages in your finite-state machine which are out of sequence, maybe because they have taken a different route from the previous messages. If you need to start handling these messages when you are outside of that particular state it becomes complex, you need to read them, then you need to store them, then you need to move to the next state then you need to see "Have I stored any messages?" and then handle them.
Instead you just leave them in the mailbox. You go in and then just pick out the messages which are relevant to that particular state, you execute your actions, you have your state transition and when you are into your new state you just go into the mailbox and just and see "Are there any messages which match, which I can receive in this particular state?"
Simon Thompson: So it’s asynchronous communication, that is one thing to say and that supports this mailbox model. And I think processing things in the order in which you want to process them, instead of having to deal with every possible order in which they might have arrived, that is a wonderful abstraction that works tremendously well. And again that is the place where it is very well thought out because you can’t do that with synchronous communication, so it rules out certain sorts of deadlock. I mean it can introduce other problems: potentially your mailbox can fill up with messages like junk mail. If you don’t process them they will remain in the mailbox, but that is less of a problem than hitting a deadlock. So it supports a modular style of programming.
Francesco Cesarini: Exactly. And the fact that there is no shared memory, if one of these processes crashes you can restart it and as long as you recreate its state you will not affect other processes around it. The biggest danger would have come had there been shared memory. If a process crashes when it’s in a critical section, dealing with some aspect of shared memory, anyone who uses or accesses that memory will also have to be terminated. And so, that, once again, adds even more complexity when you program, which you don’t have with Erlang.
Simon Thompson: The other point I would make is I am sometimes surprised that people use the word "actor" in this context, is it has slightly the wrong connotation. Actors may be somewhat heavy-weight and one thing about Erlang processes is they are very light-weight, creation is very light-weight, communication is very efficient. So it’s easy to create tens of thousands of Erlang processes running simultaneously. One tends to think of actors as being handfuls of actors, but you can have tens of thousands of processes working quite happily together. So it’s light-weight and that of course is one reason it can scale to multicore very well because the model encourages you to use as much concurrency as there is in your problem. So if you are dealing with requests to a web server you don’t have a number of threads that you share; every request creates a new process, a new thread, which exists for the length of time that that request is processed and then it dies and you use another thread.
Francesco Cesarini: You just free up all the memory.
Francesco Cesarini: That is correct. It’s a per process generational garbage collector, incremental garbage collector, which means it reduces the impact when garbage collecting and you just garbage-collect when you need to free more memory on a process basis.
Francesco Cesarini: Behaviors is providing a process with a particular behavior. So there is this set of libraries which you can use and you give your process a behavior such as a client/server behavior or a finite-state machine behavior or an event handler behavior and all of these are then supervised by what we call a supervisor behavior. And they provide the generic framework which will be the same from one process to another, from one program to another, so sending a message, receiving it and sending back a reply.
Francesco Cesarini: A supervisor will monitor children, for example. Picture two children or two processes; one process is handling a web page, serving a web page, a web server, the other is handling an SMS. If any of these two processes terminates abnormally, because of a bug or because of corrupt data, the supervisor will handle them in a very similar way, it will receive a notification that the process has terminated abnormally and based on certain parameters it will go in and make a decision on whether to restart it or not or terminate itself. So it’s all generic functionality, which will be the same from one system to another and this generic functionality is packaged into library modules which we refer to as behaviors.
Simon Thompson: The other thing is it modularizes successful and erroneous behavior very well. You can see these things in a tree, where as the leaves you have the processes which will perform the actions which we usually refer to as "workers" and then they are supervised by other processes which make sure that the erroneous states that these workers might get into are dealt with properly, data is cleaned up, they're perhaps restarted or whatever. So workers work and supervisors supervise. Workers do normal behaviors and supervisors deal with erroneous behavior. And there are some links in the model which push the two together, but having that separation is very important.
Francesco Cesarini: We talked about 20 times decrease in code size, that’s is when you start using these behaviors and these library modules, which in other programming languages won’t necessarily exist. So that is why you get such huge reduction.
Francesco Cesarini: Code reuse, but the most important part is it allows you to write simpler code. If you don’t need to worry about error handling your code becomes easier to write, easier to maintain and by default will have less bugs in it.
Simon Thompson: The other part of this package is this open telecom platform, the OTP libraries, which is for more than telecoms, but it’s a set of libraries, of modules, that provide generic behavior. So the supervisor helps you deal with failure, but there are other things: you can generically produce a server and the only thing the programmer has to provide is a specific behavior that that server has. So for instance the messages it will receive, how those messages affect its internal state and the replies that you get. So the programmer supplies that and the generic server behavior provides all the communication that is needed, all the robustness that’s needed and so on. So programmers don’t have to worry about the concurrent parts or perhaps the distributed parts, all they have to do is provide that relatively straight forward specific functionality.
Francesco Cesarini: It handles all of the tricky parts and all the dangerous parts of concurrency and it takes it away from the program so the programmer doesn’t have to worry and think about it. And you will get a deadlock, there is a deadlock prevention mechanism in it and it will automatically terminate processes. If you send a request to a process and the receiving process doesn’t exist, the sending process will terminate. If you send a message to a process and the process terminates while it’s handling that message the process which sends the message will also terminate. So there are lots of things behind the scenes which a programmer is not aware of when he’s using these libraries which remove a lot of the errors and prevent a lot of the errors which would otherwise occur.
Simon Thompson: Yes and one of the nice things is that to a first approximation the programmer needn’t be aware where the processes are.
Francesco Cesarini: You can monitor processes on different machines.
15. What protocols?
Francesco Cesarini: We have what we refer to as an Erlang node. An Erlang node is an instance of the VM and you can have many Erlang nodes interacting with each other that can be on the same machine or on different machines. And they use TCP/IP to connect to each other.
Francesco Cesarini: Yes, you have got a process identifier, this process identifier could point to a process on your local node or on a remote node and you as a programmer don’t need to know if it’s a local or a remote node; you are sending a message and it will just send it off. And so by doing these things right from the start you can easily have a program which is written for a single node and you can easily distribute it across a cluster of nodes as your load increases.
Simon Thompson: The promise that the Erlang runtime system provides is that between any two processes the order of messages will be preserved. So if you have two processes and the message is sent from here to here then another message is sent from there to there. The first message is delivered before the second and that promise holds whether you are on the same node or on different nodes. The only difference about nodes is that I guess a node can go down and a system can get disconnected from the network, but if they are delivered, then the messages will be in the same order. So it does the right thing and it is almost transparent.
Francesco Cesarini: What you need is a very thin layer. I look at other programming languages and people tell me about other complex frameworks to handle the messaging between nodes and want to know how we do it in Erlang. There is no framework, you just write a very thin layer about 50 lines of code, if that, which will deal with it for you and they often sound incredulous but the advantage here is Erlang has distribution built into the language, the other languages don't.
Francesco Cesarini: It will just monitor if your remote node goes down, it will monitor what pool of nodes you want to send a message to one of many nodes, slave nodes, it will just pick one at random or it will hash to them. So it’s just a very thin communication layer which will monitor the connectivity between the nodes and handle those cases where your TCP IP connection goes down or the node itself terminates.
Francesco Cesarini: Instant messaging, for example. Think of instant messaging where you store the session information in a table and every time you receive an IM or you want to send an IM or you receive a status update you just create a new process. As a status update you create a new process, you update your local database and then you push it out to the client. When a client has received your status update you terminate that process. Or the same applies to sending an IM: you send an IM from a client, it’s received on the server side, you create a process which will then forward the message on to the buddy you are messaging. When your buddy’s received that message you terminate your process. That is one example.
Simon Thompson: Soft, realtime, fault tolerant, robust, communication intensive, e-commerce.
Francesco Cesarini: Web servers, messaging, short message services, financial switches.
Francesco Cesarini: It’s all about using the right tool for the right job.