BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Ulf Wiger on Robustness and Scalability in Erlang

Ulf Wiger on Robustness and Scalability in Erlang

Bookmarks
   

1. Can you tell us about yourself and what you’ve been up to lately?

Lately, since February of 2009 I have been the CTO of Erlang Solutions, formerly Erlang Training and Consulting. Before that, I worked 13 years at Ericsson, so it’s been quite a change going from a company of about 100,000 employees to a company of about 50. It’s been a very interesting year.

   

2. What do you do in Erlang Solutions?

Erlang Solutions works with training, consulting, in-house development. We’re a fairly distributed company; we have offices in London, Poland, Sweden. We have some consultants living in Italy and we’re hoping to open offices in other countries as well. One of my main challenges is to try to find good ways of working together and getting the information flow inside the company as efficient as possible.

One of the big problems of being in a consulting company is that very often; many of your staff are out on site, maybe working for years at a certain customer site. Then, trying to get the company to feel like a company and getting everyone to feel part of what’s going on is one of the biggest challenges for a consulting company. It’s a new and refreshing challenge for me.

   

3. Lately we’ve been hearing a lot about languages that address problems of multi-core programming, like Erlang with the concurrency model and there is Haskell with functional programming and other like Clojure with software transactional memory. Here, a developer gets a bit lost between all these paradigms - "What paradigms should we use in my problem?" What do you think of that?

It certainly is a bit confusing and also a very quickly evolving area. I think the multi-core architectures, there are different competing architectures and a lot of different problems that we are only beginning to understand. Then we have the real problem with the legacy code. In many cases you don’t really get the choice of for example switching to an entirely different language. You are pretty much stuck with the model that you have in your application and the big challenge is to try to evolve that into something that works on the new architectures. With my previous employer that was, a company like Ericsson that’s been around for 120 years, has a lot of legacy code.

That becomes the dominating problem. Given that we have this code and it has this architecture, how do we take advantage of the new architecture or the CPU architectures? I guess there are 2 very important sides to it. One is evolving legacy code and one is given that you have the choice, which models fit best for your problem. There, I think, are some very exciting developments, like you said, with Haskell, Erlang, Clojure, which all 3 represent slightly different attacks on the problem. I think they are usually complementary.

   

4. Can we start with the Erlang model? How does it handle these 2 problems that you are talking about?

Erlang grew out of the telecom sector. It is more or less an agent-based language, even though that label has been sort of retrofitted on top of Erlang. It wasn’t deliberately an agent-based language, but the people who come from agent-based computing likened it to agent-based languages. You have lightweight processes that communicate with messages and, if you will, share-nothing concurrency and it’s a model that fits very well with messaging systems. In telephony it’s very much a messaging system when you’re setting up connections. There are protocols, machines communicating over the network.

You are getting messages in from the network, you need to translate them to messages being sent to the next box in the network and the big problem to solve there is coordination with lots of concurrent activities. For that type of problem, message-passing and especially share-nothing message-passing is a very good fit. In the Haskell community, if you would like to do Erlang style concurrency in Haskell, you can. That’s not really the focus of the concurrency research that goes on in Haskell, they focus more on data parallelism.

   

5. First you talked about share-nothing messaging. What do you exactly mean by share-nothing?

The traditional way of doing concurrency in C, for example and also in Java is that you have shared memory and you have processes or threads share data. And they actually have to take turns accessing shared objects. In Erlang, processes don’t really, at least logically share any data at all. There can be sharing going on behind the scenes, but to the Erlang programmer, each process has its own copy. So, when you send a message it appears as if the message is copied to the other side. Of course, when you send a message over the network, there has to be copying, because there is no shared memory between 2 computers on the network.

The idea with Erlang is that the model is symmetric. There always appears to be copying. The concurrency model in Java is specifically shared data concurrency, where you have synchronized objects where you can define a critical region where only one process at a time can enter and the others have to wait. That is a different model. You can’t really say that one model is better than the other. Sometimes shared memory concurrency allows you to increase performance tremendously whereas no shared data concurrency primarily in the case of Erlang was to increase robustness.

If a process dies, it cannot corrupt the memory of another process. So, from a robustness point of view, if you want to have a telecom system that is always up and can withstand software failures without dying, this is a very good model, because you preserve the integrity of the processes, even if processes around them are dying. It so happens that that model fits very well with multi-core because sharing memory and synchronizing memory regions across multiple cores is a very expensive operation.

Because basically you have to synchronize all the CPU cores when you want to access shared memory. If there is no shared memory between the programs they can be truly parallel and you have a very minimal region that needs to be synchronized. Multi-core was very welcome to Erlang programmers, it fit extremely well with the model. That wasn’t the initial intent, but it happened that way and it was a good thing for us.

   

6. That is also the model of shared memory that got introduced in Clojure. What do you think of it? There is memory sharing in Clojure.

Yes. Clojure very deliberately deals with in-process concurrency and dealing with shared memory objects or shared memory concurrency in a sane way that is easy to deal with and safe for the programmer. Rich Hickey, the author of Clojure, has been very clear about this choice that he does not address the same problem area as for example Erlang does. I think that the work that he has done is amazing and very thought provocative. It has broken some new ground, I think, which is very interesting. It really does address another problem domain than Erlang does.

   

7. Can you tell us what you feel the most interesting in the language and its model of addressing multi-core?

You mean Clojure, right? Clojure builds on the idea of declarative programming and immutability, so that you really want to minimize side effects as much as possible.He has taken a very interesting approach to how data structures evolve over time so that you can have different parts of the program can look at different versions of the data structure that allows you to implement a sort of concurrency and synchronization in a very efficient manner. Also because of the immutability and the way the data structures work, it also scales pretty well for multi-core, because you are not really locking the same objects. You are creating new instances in a very clever way.

   

8. You started also talking about Haskell. The Haskell type system give some guarantees for data parallelism and for multi-core. Can you tell us a little bit about that?

I will say that I’m mainly a follower of Haskell and I haven’t gotten to the point where I’m actually writing Haskell code myself, but it really is an amazing language, I think, in many ways the most powerful programming language on the planet. The type system in Haskell does give you tremendous amount of control, especially in terms of concurrency in how it separates side effect-free code from side effecting code. In Erlang, which is a different, a more pragmatic approach. This is very good advice to programmers that they should do this, but there is no guarantee and there is no tool support that tells you when you are mixing side effecting code and side effect-free code in a bad way.

It’s more of something that we try to uphold by convention. Now, in Haskell you can really put clear boundaries between dirty code and pure code. It so happens that when you get into data parallelism, the compiler can exploit parallelism in a way that is very difficult to do in other languages because it knows at compile time that things cannot mix, so we can do very aggressive parallelization and optimization that you couldn’t do without that type system. I think it allows them to push the boundaries in terms of data parallelism and in terms of transaction memory, which is another interesting concurrency abstraction for multi-core.

Often, if you try for example to come up with efficient methods for transactional memory in Java, what you can do is you can try to have optimistic approaches where you hope that the program is not going to do something stupid and violate some of the properties that you would like to guarantee whereas in Haskell you can by design make it impossible for the programs to do stuff that they are not supposed to do. That really makes a big difference for the compiler writer because they can make much more aggressive assumptions about the code and therefore they can make more powerful optimizations.

Haskell is a pretty advanced language and perhaps not for everyone; it does force people into kind of thinking about abstractions that I think is very difficult for many people. It’s going to be interesting to see which course Haskell takes, but it certainly is on an increasing popularity trend. It’s difficult to predict anything about Haskell’s future. They had the credo "Avoid success at all costs!" and they have obviously failed with that because they are reasonably successful.

   

9. Some languages have been trying to use Haskell as inspiration for introducing functionality into them, like C# introduced LINQ and in Scala they used some of the type system of Haskell. Do you think it’s a good thing to integrate just a part of Haskell, but not all of it, you shouldn’t get all of Haskell to get all the benefits out of it.

I think this is sort of a necessary evolution and having worked in the industry and also in a reasonably conservative industry, I have come to accept it as a fact of life in industry that you very seldom can take just quantum leaps into a totally different technology. This evolution where you steal bits from other languages or other technologies and you carefully integrate them, I think this is something that should happen much more often. Personally, I am a bit frustrated sometimes with the polarization that happens very often in the programming language camps where you religiously stick to your own language camp and you try to prove always that your language is better than somebody else’s.

Look who’s talking! I’ve been working with Erlang since ’92 almost exclusively, so I do try to be open to other technologies, as well and one of the things I’ve been thinking often during that time is that when you invest this much time into one particular language, the best thing you can hope for perhaps is that the ideas of that language infect other languages. Because the more that happens, the easier for me it is to move into other technologies, because - really - the concepts, the way of thinking is more pervasive than the syntax.

I think this is also demonstrated by object oriented languages like Ruby and Python that have quickly gained a following up, not necessarily because their syntax is so much like C, but the concepts are familiar enough that you quickly learn the differences and you quickly become productive, because they are still in the same concept domain. If you learn to do object oriented programming in Java or C++, Python is pretty comfortable because it’s still roughly the same type of abstractions, the same type of thinking.

If you go from an object oriented mindset or say an object oriented mindset mainly sequential programming not having much exposure to concurrency, switching to something like Erlang will feel very alien. And it’s not about the syntax, it’s about choosing an entirely different way of approaching the problem, dealing with different abstractions and a different way of structuring the problem. That is the most difficult leap, not whether you have a semicolon or a comma at the end of a line or something like that. That really is trivial, but it's often what people get hang-ups about.

I think the ideas need to be discussed more often and it really is extremely useful to try to understand the core concepts of other paradigms. Learning Haskell for example is something that everyone should try to do, whether they end up using it or not. I would also like to say that learning Erlang is something that everyone should try to do because I think it represents some very important concepts, especially in the multi-core world.

   

10. Can you give us examples of each of these paradigms we talked about? Where do they fit best for Erlang or for the Haskell’s model and for the Clojure model?

For Erlang, one thing that I’ve been waiting for and that I think is coming into fruition now is that I’ve seen online applications as reasonably similar to telecom applications. When we tried to exploit this similarity in the ‘90s it didn’t go very well, because we were saying that if you have an e-commerce application online, you would want it to be as robust as a telephone switch, you would want it to be as scalable, you would want it to be able to handle thousands or hundreds of thousands of users with good response time and fault tolerance and handling of a lot of complexity.

But at the time that was not what e-commerce was about. It was, in many cases mainly about producing a pretty catalogue on the web and then the actual demand of the services was not that great, but it’s an entirely different world now. If you try to field an attractive online service you have to think about scalability, you have to think about robustness. You really want your system to be online all the time because if people start feeling that it’s not available when they need it, they are going to go somewhere else. It’s fiercely competitive and the market is global.

So, if you strike it big, you are going to have a ton of users very quickly. Scalability and robustness is not something that you can put off until later. You have to really start thinking about that from the beginning. Another thing that I think we are seeing is web services are becoming more and more interactive and growing more and more into real time services. We all monitor Twitter, for example. If you do a Twitter search, it will at least indicate that there are new hits; right now you have to update, refresh to see them.

If you go into a website like Collecta, it’s actually going to update in real time more or less and I think we’re going to see much more of that - services on the web that are increasingly interactive and increasingly responsive. We are going to see more and more high value services entering the online market. The more of this we see, the more it’s going to be important to have software technologies that scale, that are robust and also give you good turnaround time. These were the drivers behind Erlang in the ‘80s when the development of it first started.

Those were exactly the requirements that drove it. We were seeing that there was an ever increasing complexity in telephony systems intelligent networks and more and more features and bigger and bigger networks and more services on top. We needed to address the complexity, but also we knew what problems or the basic requirements that it had to be the performance requirements we needed to support, high scalability and also robustness. The requirement that is even with planned downtime you shouldn’t be down more 5 min a year tops. This is actually one thing that attracted me to telecoms in the beginning, when I first came across these requirements.

Having come from datacom, 5 min downtime a year in the datacom sector is a pretty extreme requirement, but that has been the target for decades in telecoms. That drives a lot of other requirements, like you have to be able to update code in a running system, because if you take it down to load the new code and transform things offline, you’ve burnt several years of downtime just doing one upgrade. It’s unacceptable. You have to have techniques for doing that in a running system.

I think those core requirements are becoming mainstream now. They were not mainstream then, they were particular to certain niches but now this is what everyone has to try to figure out. If they don’t all jump into Erlang, which I certainly don’t expect them to do, it is very interesting to look at how these things were solved in Erlang and, by all means, get inspired, try to figure out how those qualities can be replicated in other environments. I think that is a worthwhile endeavor and we are willing to assist that. I am and many other Erlang programmers are very interested in helping out.

   

11. Clojure - when do you think it fits really well as a choice for multi-core programming?

I think I must say that I don’t have any really good example. That’s not to say that I don’t think that there are any, but I know that there are some problems that Erlang does not solve very well. Especially when the problem calls for access to shared data structures. There is always a tradeoff. Like I said, one of the driving forces behind Erlang’s model was that you can’t risk corruption of data structures just because one process happens to die. I think that is one thing that is going to be a bit difficult perhaps in Clojure to model.

But, if you don’t really have that requirement and what you need is very efficient synchronization over shared data structures, there are certainly a lot of problems that fit that description. It seems like a very interesting approach. Now, there are of course other aspects: Clojure is a new language, it needs to grow a good strong community. Probably it has a good chance of doing that and I hope it will.

   

12. What is your favorite programming book?

The obvious answer would be the Cesarini-Thompson book on Erlang, but that is too obvious, isn’t it? One book that certainly is an extremely interesting read is Tony Hoare’s Communicating Sequential Processes. I think that is one of the mandatory books that you should at least read, especially for understanding message-passing concurrency.

Oct 13, 2010

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT