Jonas Bonér and Kresten Krab Thorup on Bringing Erlang's Fault Tolerance and Distribution to Java with Akka and Erjang
Bio Jonas Bonér is a co-founder of Typesafe and creator of the Akka event-driven middleware project. Kresten Krab Thorup is CTO of Trifork, where he's responsible for technical strategy, researching future technologies, and the GOTO and QCon conferences.
The Erlang Factory is an event that focuses on Erlang - the computer language that was designed to support distributed, fault-tolerant, soft-realtime applications with requirements for high availability and high concurrency. The main part of the Factory is the conference - a two-day collection of focused subject tracks with an enormous opportunity to meet the best minds in Erlang and network with experts in all its uses and applications.
Jonas Boner: I’m Swedish, I’m CTO of Typesafe, a company we just launched publicly about a month ago, and we’re building a stack on top of Scala, middleware stack, developer tools, support and services and also products around the Scala ecosystem. My contribution there may have been Akka, Akka is a middleware sort of product, is heavily influenced by Erlang. Long story short: Erlang OTP is a product that inspired me to start Akka, I’ve been writing middleware systems, servers and systems for most of my career and mainly living on the JVM, I got really excited when I got into Erlang and I learned it and I wanted to get others excited about it.
Some were, but a lot of people, most of my customers and my colleagues where on to the JVM, I had a hard time convincing them to move over to the Erlang runtime, and I felt: "Ok, these things that they implemented and the concepts are really sound and have been around for years but they are really alien to what most Java developers see and use, and these things are to good to just be left to some Erlang geeks and this excellent community there." So I felt the need and urge to move over some of these ideas onto the JVM and I started Akka as an attempt to do that. We took a lot of the best ideas from OTP and also some other ideas from other languages, like the language Oz which did data flow concurrency and we borrowed some from Haskell and Clojure and stuff like that, and put it together in a mix that tries to address both like scaling up, both vertically as well as scaling out, horizontally.
With a unified platform or programming model you can say and that’s short what Akka is.
Jonas Boner: Exactly, Akka is implemented in Scala, the Scala compiler emits JVM bytecode, so it runs on the JVM. Akka has a Scala interface, it's a Scala API as well as a Java API, so you can use them from both; also, people have used it from Groovy and JRuby and other runtime languages onto the JVM; it’s implemented in Scala but it’s not solely Scala, it’s a Scala product.
Kresten Krab-Thorup: I really think that there are some new concepts as Jonas was saying that are hard to grasp for Java developers, but they need to have them in full, in my opinion, like the whole isolation mechanism for instance, that is really core to Erlang being used to this aspect of Erlang. Erlang is many things, there is being a functional program language, being concurrent but I don’t think that’s so important, it’s not the big stumbling block, I think the big stumbling block really is as much practical. In Java you’re not used to having strict state management, maybe ten years ago there was this proposal out for doing isolates in Java. It’s kind of like that having little islands of mutable state that can touch each other and that’s a core concept which is beautifully encapsulated in Erlang the language.
So using Erlang the language to program this kind of way is a very good way to do it. You could do it with many other languages and that’s a kind of pipe dream of mine, you might even imagine having a new language that kind of adopts this model. The strict isolation is in my opinion important and it’s as profound as having garbage collection, so imagine you are looking at some other language that has garbage collection and you’d really like to have some of this cool stuff and then you implement a garbage collector but you can still use the other stuff. Then there are a lot of boundary conditions that are difficult to manage. Of course, it’s also very practical, so it’s a pragmatic thing.
Objective-C did the same thing, these objects are really nice, but you’d still be able to use C and C++ and integrate really nicely with that. And that’s also kind of a "bastard language", a mix of different worlds but it ended up being pretty successful so we hope for this mixture to be successful, too. But I would like to force that on, say this is the box, is having state management properly in the language. I can see the appeal of Scala and Akka and all these concepts because obviously it’s the same ideas that I’m pushing with Erlang and Erjang but I’m afraid a novice programmer that comes into a project, in maintenance mode, will find it too easy to do, like quick hacks and circumvent these things probably. But that’s only one part of the equation of course, there are many other things, depends on where you are coming from.
Jonas Boner: Scala favors immutability in the way that the default collections are immutable and you can ask for mutable if you want; it also makes the distinction, syntactic distinction between variables and values, in Scala code you always use values, that means final variables in Java and stack allocated variables. It really tries to make it easy to do the right thing when it comes to state management. Scala has this functional story, it mixes two different paradigms both in OO (object orientation) and from there you have more common idioms of using mutable state and then you have the functional side of Scala that heavily favors immutability and use of combinators and transformations of state, without having mutable state.
Akka also tries to favor immutability but the problem by running it on the JVM is that you have one single shared heap and it’s not like you have on Erlang that every processes have their own heap and you can start tampering with some other states. So currently in Scala and I think on all the JVM languages, this state management needs to be either by convention, or somehow enforced at runtime by the tools. However, there is some research in the Scala community right now, Martin Odersky’s team to enforce that into the type system as a way of guaranteeing that only immutable state is used, also to guarantee uniqueness that you can’t from one component access other component state. That is nothing that has gone into the Scala compiler or the Scala type system yet.
But I think there is some interesting research that we might see ending up in Scala within a year or so, that is something I would love to have cause then we would get compile time enforcement on this even though we can’t get that inside the JVM itself we can at least catch that immediately.
Kresten Krab-Thorup: There was always a problem also, ten years ago I was working on an EJB container. It was also one of the problems there. EJB is trying to be this resource container that would declare their dependencies and also have managed resources, you could open files but they would then not be closed in EJB and there was no way to manage that in a JVM environment unless you want to analyze all the code that you would potentially call. There was no way to do proper resource management that way. This aspect is something that is a very different mindset and I think it has a huge value to actually enforce this. Scala obviously lends itself to people who like types and feel relieved when they know their system is typesafe; it’s actually the same kind of thing with state management when you know you are in this world you’re like "It’s so easy, I don’t have to worry about how I affect these other things in the system".
So you get this original, in OO world we’ve always talked about encapsulation, how objects encapsulate, but it was a big fake, it never really encapsulated, it’s the same kind of relief you get when you start studying OO, it’s really nice to think of these as encapsulations and then 20 years later we realized these encapsulations should actually have that thing.
Jonas Boner: It’s actually why we have these massive test suits in OO systems because you are never really sure, you do some refactoring over here and then some bug emerges over here and then you fix that and then it breaks over here and so on because every state is so entangled all over the place so you have really no idea what’s going on. If you use a functional approach then all state change is local and at least if it’s a pure functional approach you get referentially transparent code that you can swap the value or the actual function for its value and just knowing that I’m only affecting my state locally and I’m sure it won’t break all over the place - that makes me sleep better at night.
Kresten Krab-Thorup: It’s a relief. It’s a "This makes things easy" kind of feeling.
Jonas Boner: In actors you can have mutable state but it’s completely isolated, it’s still a relief, it’s completely enforced. In a way, especially if we’re going to go into the top of concurrency where Erlang shines and Erjang shines and Akka also tries to pull its thing to the table, raises the abstraction level from being like plumbing and management of this state, because basically if you’re on a single-threaded system then your system can be sort of deterministic. You have some idea what’s going on, but as soon as you’re starting adding threads to the system everything becomes completely indeterministic: you have no idea what’s going on because threads don’t obey any nice encapsulation rules set up by the typesystem, they don’t compose, they run into each other, over each other, corrupt each other.
They’re extremely hard to handle but if you use something like the actor model then this plumbing becomes workflow instead. You think in terms of how the messages flow in the system and not about how you should guard your state against others, that makes it so much easier to reason about the code and understand what’s going on.
Kresten Krab-Thorup: That’s all at the micro level - how you make your code manageable and understandable, but I think using actors as a way to encapsulate this are very nice, but I think that at a higher level the superstructure system, the really interesting part of both Akka and Erlang, is how you build systems that are reliable and I think that’s something that hasn’t been appreciated and valued enough. Obviously if you’re a customer and you buy a system you expect it to be reliable but that has never really been much of a topic except in very distinct areas, close to the hardware management, like for instance telephony systems or robotics and other kinds of worlds. That’s never been in the mainstream of information systems.
Another reason why these things are interesting is that now we are building our systems by composing web services, like one of the healthcare projects I’m working on, we have more than 40 systems integrated, and you can’t rely on all of those behaving correctly and being up all the time and stuff like that. Also because they all depend on you it becomes much more important and valuable that it works and it always works. Because computer systems are getting interconnected and dependent on each other, all of a sudden it raises the value of availability. That’s a business level value, really, of Scala, Akka-style systems and Erlang-based systems. We have design patterns and system design patterns, architecture patterns for how to make things really work and really be stable and we are starting to see a lot of interest in that.
I think that’s the interesting angle where to push this. It’s not just new cool technology, it’s there for a reason having availability and patterns for building reliable systems, this is the reason we should start paying attention to.
Jonas Boner: Absolutely, that is where this kind of approach really shines when you have distributed systems, you can’t build a reliable system with just one node because if you get a hardware failure or out of memory it’s game over. The way we normally or traditionally makes systems like that, in the world of JVM and also in CLR or whatever in most cases emulate shared state in a way. We have this familiar world even if I think it’s fundamentally broken, even on a single system people are in their comfort zone so that most of the systems that try to give you distributed computing or distributed stores, they try to stretch out that sort of thinking across many machines.
The problem is you can’t get any more than an extremely leaky abstraction because when you have the network between two machines there is nothing like shared state. Underneath that it’s all message passing, and I think instead of seeing, embracing the limitations of distributed computing and of the network and so on and seeing everything as message passing and share nothing, it makes it a lot easier to think and program with that.
Kresten Krab-Thorup: To make composable subsystems in general. The leaky abstractions make the subsystems and components not able to compose because they interact.
Jonas Boner: That is also what got me so much interested in Erlang, what’s their fault model, that they are embracing failure as a natural state in the lifecycle of the applications, it’s not something that we hope that does not happen at all, try to guard ourselves, prevent it, instead it’s a natural thing, especially if you write distributed systems, it will go down. You’d better have a good way of dealing with it and I think the Erlang story is extremely appealing there, because it’s not trying to lie, to fake something that it’s not there, but instead embracing it at its core as a whole and make the best of it instead. That’s a lot of the things that I've been trying to port over especially with systems integration like Kresten says. You can never trust another guy’s system. I have trouble trusting my own; I would not bet my money on someone else’s.
Kresten Krab-Thorup: Being honest to the failures that happen. When I did a talk at GoTo Copenhagen some time ago, there was a quote that said: "Defensive coding is a sign of a weak system, weak platform", it was quoted and tweeted all over. But the way you handle errors or faults, with these kinds of systems you try to not handle them upfront, you write your code much simpler, just put assertions in and then you let it fail, and then you do fault handling at a meta level. So if you have that level of fault handling after the fact, instead of trying to prevent everything, it makes your code down here so much simpler. You’re coding to the spec and not protecting everything around the spec.
Of course you have to handle boundary conditions when information gets into your system the TCP socket level, of course you have to check the protocol, but once data is inside your system the client should behave properly towards services that they use not check everything, it makes your code so much simpler because you can assume that if something fails it doesn’t take down everything. The notion you already have from your desktop operating system, if Word fails, it doesn’t bring down PowerPoint, at the micro level inside of your program you can think of it as "If this component fails...", but it’s a really hard thing to get used to. It took me a while to kind of get that in my brain, to be comfortable with having part of your program fail, it doesn’t mean everything failed. You can hear people say it many times, but you have to feel it on your body before you’re comfortable with it.
Jonas Boner: Right. You can never do that if you have a completely shared state, when everything is entangled. You need to have a separation, a level of indirection, and as Kresten says, one of the really nice things is that you have this nice separation of concerns and that gives me a way to almost declaratively configure how my system should react upon failure rather than program it, entangle it with my code. I’ve been very deeply involved in the Aspect Oriented Programming (AOP) community for a long time that’s what we are all trying to do, separation of concerns "That thing belongs in there and that thing belongs in there" and I feel the same thing when I’m using Erlang, in this sense as well. You have nice separation almost declarative way of doing it, put that thing there. It works very well in practice; it’s not just the theoretical exercise.
Jonas Boner: That’s much harder, it’s easier to start your own thing and start to influence the people who want to listen, you need to influence and press the people that don’t even want to listen, at least initially.
Kresten Krab-Thorup: It’s going to be interesting to see, we had the same kind of thing probably happening in Java, it became obvious that Java was the next big thing that could really raise the bar, make it much easier to build systems. Many businesses went from C and C++ to this odd new language called Java which wasn’t really that fast, it had all kinds of new weird concepts.
Jonas Boner: Right, it’s an education process, I think it’s an easier sell now than it was ten years ago, I remember when I started working for Terracotta, a Java clustering company, when we came into customers they didn’t even know that they had a scalability problem or a fault tolerant problem, whatever, they were confusing performance with scalability, we have to go in and start at ground level, at computer science what is really what kind of problem do you have. Nowadays people are used to NoSQL databases, they are used to using all kinds of stuff to scale out, and they are used to writing distributed systems using REST and stuff like that, people are becoming more aware of these kinds of things. I feel it’s much easier, now there are almost too many ways of doing things instead.
Kresten Krab-Thorup: There is definitely, we need some kind of convergence phase and educational phase where these concepts that are obviously there but they need to materialize in the broader developer community. The kinds of things that come from hardware that can fail, might as well just do message passing, because it can fail. The fact that shared memory model doesn’t work, what does that materialize into in terms of thinking. I’m just saying, I’m deep into Riak, and there it’s the same kind of thing where you say "write conflicts happen, so different places in the world two different people can write two sets of key to two different values, so at a later point you’ll have to deal with that." So there are some core concepts there that are just facts of nature, it’s almost like cheating.
Jonas Boner: What I like about that is they lift these fundamental properties up in the hands of the programmer there is a lot of responsibility but on the other hand it’s trying to not lie more than you need to. You don’t have to fake anything I really like that approach, the less magic the better. That’s also something we really try to embrace in Akka as well; for example we have software transaction memory that have had really bad reputation because of all the magic, all the implementations that try to just bless whatever mess the user makes up. Instead of that we have taken on approach that is extremely explicit, it’s almost a bit ugly, you need to wrap all state that you want to have transactions into magical references, it’s very much in your eyes, I think that’s a really good thing, you know exactly when you will pay the price for it, what kind of semantics you will have and so on.
You see that in NoSQL also embracing all these concepts like the Dynamo Paper and all these things, better to lift them up than shield it away and make it "it should be so simple that you shouldn't even have to think about it", it’s never that simple.
Kresten Krab-Thorup: We have these fallacies of distributed computing we’ve had that around for the last 10 or 20 years or 30 years, our PC is bad and shared memory is bad and it will fail. But we’ve been living in this world of .NET remoting and RMI and trying to make everything, like Terracotta and trying to make things look like shared memory and for a while we’ve been able to push that, to be good enough, but now as systems become more integrated and as cloud is more widely used, people more and more run things on infrastructures and services, hosting providers and you get latencies and all these networks effects are just exposed even more, so those fallacies can come back and hit you. I think it’s about time, definitely.
Kresten Krab-Thorup: The virtualization and stuff, there is one branch were they are really trying to make that all old software continue to run, all the old Windows apps they don’t want to rewrite, like how can we put this on some virtualized hardware and make an environment were it’s like putting artificial life into these old systems to run them on this cloud stuff, there's a good business case for that. But the integration that happens everywhere it’s across all systems, you need to do all this integration inevitably kind of exposes failures that just happen there, they can so easily propagate and have other things fail so we need to expose them, I completely agree.
Jonas Boner: The most OOP language, that depends on how you define object orientation, if you go all the way back to Alan Kay’s original definition of OO and the way he initially thought Smalltalk should work, I think it’s a pretty close match and it’s not even a remotely close match comparing that definition to classes that you have in C++ and Java.
Jonas Boner: Even asynchronous message passing. When I am out talking about Erlang or Akka or actors, message passing, all these active objects I try to talk about them as objects they encapsulate state and behaviors, they can have message passing, in the way they are implemented in Akka and Scala they have polymorphism and inheritance but I think that is not the essence in my view.
Kresten Krab-Thorup: There are as many definitions of object oriented as there are people saying the word, no matter what you say the responses to that question will have all the other people after you, I completely agree with Jonas. When I describe actor programming I talk about it as objects and it comes very naturally and lots of the techniques that we have developed with object oriented programming applies really well to this model, like CRC cards where you have people playing the role of different classes, it’s much better when it’s for actor programming because then there is a natural, people talking to each other, having activities and those kinds of things. There are many intersecting concepts that are usable in both contexts.
Jonas Boner: I think it’s useful to think about it regardless of where you are with the definition, I think it’s useful to think about actors as objects, model them as objects, that’s the most important for me at least.