InfoQ Homepage Interviews Kresten Krab Thorup, Robert Virding Discuss the Erlang VM

Kresten Krab Thorup, Robert Virding Discuss the Erlang VM

Bookmarks

Download

56:05

Bio

Robert Virding worked at Ericsson and was one of the initial Erlang design group. Robert now works for the Swedish Defence Materiel Administration (FMV) in a modelling and simulation group where he mainly works with computer games. He contributes to the Erlang community and has written a number of books and articles on Erlang. Kresten Krab Thorup works for Trifork, and is the creator of Erjang.

About the conference

The Erlang Factory is an event that focuses on Erlang - the computer language that was designed to support distributed, fault-tolerant, soft-realtime applications with requirements for high availability and high concurrency. The main part of the Factory is the conference - a two-day collection of focused subject tracks with an enormous opportunity to meet the best minds in Erlang and network with experts in all its uses and applications.

Transcript Interactive Show All Hide all Full Page Transcript

1. The origins of the Erlang VM

Kresten Krab Thorup: You’ve been working on the original Erlang VM, several generations of it at Ericsson, pre 1999. When did you first get involved with this? Were you on the original team of 2-3 people working on it?

Robert Virding: Joe was the original team and he was doing his Prolog version on it. At the same time that was going on I was looking at implementing concurrent logic languages. We all in the lab had done work on programming a switched, getting it to run. That was something we knew about. Joe was working Erlang and more of that got going and I started working with him on the Erlang Prolog Interpreter. We would write that together. Alternatively we would write each other’s code and things like this.

That was used to develop basic principles of the language and this was in the late ‘80s. Then, when the request came in from our users that said that they would like to use the language but it was too slow, could we make it 40 times faster. That’s when the work on the JAM started.

Kresten Krab Thorup: That’s Joe’s abstract machine.

Robert Virding: Yes. He was the one who sat down and defined the abstract machine for that, based on papers he’d read, some white papers from the Prolog machine and things like this. He started writing a compiler for it and also wrote an Emulator. But Joe couldn’t write C, so that’s when I came in. Joe is a Fortran programmer so his C looked like Fortran. Mark came along, saw his code and was shocked and said "You can’t do that, I’ll write it for you" and he did. So Mike came in and wrote the actual Emulator and Joe worked on the compiler.

One of the things we found when we left the Prolog Interpreter is that there are awful lot of libraries you have to write to get the system to work. It’s not just the compiler and the emulator. You need the libraries, I/O libraries, lots of basic libraries, so I started working on them. Then, we had to make decisions about how things are going to work, apart from the basic emulator, how the system is supposed to work. We did quite a lot of work on that.

Kresten Krab Thorup: Erlang has always been a concurrent language. But not always with SMP support.

Robert Virding: SMP support came much later.

Kresten Krab Thorup: What do you think of when you think of concurrency versus SMP? Why wasn’t it part of the design in the beginning?

Robert Virding: SMPs didn’t exist then. Or at least we didn’t know about them or that way. Much of the design of Erlang was driven by the requirements we were planning on building using for telecom systems. There you had the need for the massive concurrency. It was very message passing based. You needed error handling; that was also one of the prime requirements of them to work that just had to be in the system. Telephone switch - you could lose a call occasionally. That was OK, but if the switch went down, that was a major catastrophe. That type of requirement was in from the beginning. That’s where they came from that point of view.

Kresten Krab Thorup: It’s about modeling concurrency and being able to think about your programming in a concurrent way.

Robert Virding: The problem was the application was described in a very concurrent way from the definition languages. Specifications were very concurrently defined, where things were sending messages to one another and sitting waiting for messages and looking for messages and doing something. That we mapped, we felt, very well on to Erlang.

Kresten Krab Thorup: It’s funny how that design context of 20 years ago is suddenly interesting for everybody.

Robert Virding: Yes. I think it’s funny. I think it’s fun that you can take something that was actually designed for controlling hardware when it works in a completely different application domain - web application, databases or whatever. It’s something we hadn’t conceived of at the time.

2. Garbage Collection vs Realtime Behavior

Kresten Krab Thorup: At least in the original Erlang VM and also the more recent ones there are a lot of constraints built in that are targeted at the soft real time or the specific kinds of applications that you had, like there is the separate heaps.

Robert Virding: That comes from 2 requirements. You want process isolation for error handling so you won’t allow processes to crash. When they crash, they are not supposed to affect other processes, at least not unintentionally so therefore you want process isolation. Having separate heaps is one way of doing that, that also reflects back to the real time problem that if you have got garbage collection like we do, most normal collectors will do stop while they’re collecting the system.

We could not accept pauses for long time. "Hello, Mike" then nothing happens for a minute and then your system comes back and it just wouldn’t work. One relatively simple way of solving that was each process had its separate heap, which means the heap is generally small. You can garbage collect each one separately and the garbage collection time is very small and won't be noticed.

Kresten Krab Thorup: That comes at the cost of copying everything. Isn’t that very expensive?

Robert Virding: Yes, in one sense. But it works out, because I did a few experiments with other machines. We would have one heap and we would have more or less real time garbage collection. They were about the same speed when running the application, even though you were sharing things, you weren’t copying messages.

Kresten Krab Thorup: Why is that?

Robert Virding: My guess is that having real time garbage collectors, they are more complex and they cost more. So you have to pay in some way to get there - a typical no free lunch. If you want a single heap, if you want a real time collector, it costs you processing power to do it. If you have separate heaps, the collector is much simpler, the garbage collection is more efficient, but they pay the cost in copying the messages. In retrospect I can say a single heap probably would be much more difficult to put in an SMP.

Kresten Krab Thorup: It’s definitely an area where there has been a lot of research and work in the Java space to do SMP garbage collectors.

Robert Virding: That’s getting pretty difficult. I looked a little bit at it. I haven’t done any bit, I looked at what people have done there and it’s definitely getting difficult to do that. In that sense, having this separate process heaps, it’s much simpler. As a programmer, I can also say debugging real time garbage collectors is difficult.

Kresten Krab Thorup: There is a lot of VM stuff which is difficult to do.

Robert Virding: That’s really difficult, because you’ll find there is a bug somewhere, but you’ll detect the bug later, a few garbage collection times later when some pointer is wrong and you have absolutely no idea what the pointer should have pointed to, why that’s wrong and anything like this. It makes it very difficult to debug. Having a simple collector is worth a lot. They’ve done a lot with the garbage collection now. They’ve improved quite a lot, there are still separate processes, but each collector that was in the process is much better today.

Kresten Krab Thorup: You worked on it until 1999. What are the major developments that happened to the virtual machine after you left?

Robert Virding: Definitely most of the SMP works come afterwards.

Kresten Krab Thorup: That is only very recently that has come.

3. Integrating Erlang with native code and its perils

Robert Virding: That’s the last couple of years, I think. It has existed more or less, on and off occasionally people have tried to do some tests doing it, but it has never really caught on. There hasn’t been the need for it. That’s quite recent. Another recent thing is the NIFs, native functions, which I think are a cool idea.

Kresten Krab Thorup: I don’t like them. It’s interesting, because if you look at Java space there has been this development in the early days, when the VM was slow, everybody was writing snippets of things and C code to make it fast for some subprocedure of something, all the JDBC database drivers being native. Then, over time, as the VM has improved, it’s come to a point where everybody agrees that you don’t want any native code at all. You’d rather have run a little slower, but run within that safe environment of the VM. Do you think the same thing is going to happen with Erlang?

Robert Virding: Hopefully, I do. Because the trouble with the NIFs interface is you can do a lot of ugly things in there. You can do things you shouldn’t do, which the Erlang VM won’t support and you can crash it that way. Hopefully people will learn. What I can see it would be useful for or it’s useful for is interfacing existing libraries. For example, in the early ones the regular expression library uses that. That’s a good use for it.

Kresten Krab Thorup: In general, in Erlang when you interface to something you’ll spawn a process. You support a separate process that runs that thing. That’s really the preferred way.

Robert Virding: That was our view of the world, that everything was processes and separate processes and the outside world was also processes. They became ports and you used messages to communicate between ports in the Erlang system. The outside world just looked like an Erlang process or a number of Erlang processes. That gave you a very consistent view of the world and view of the whole Erlang system as such.

Originally there were separate operating system processes, but for efficiency reasons you could have them linked in as libraries. Then, of course you compromised the Virtual Machine. If there were separate operating system processes, they could crash without crashing the Erlang Virtual Machine and you would just get a signal saying the processes crashed, like any other Erlang process. Once you build them in the Virtual Machine, if they crash, that would crash the whole Virtual Machine.

That was risk you took doing it. Now you see the same problem with the NIFs as well. You are really opening up a lot of power in there and people can do nice things with it. They can also do bad things with it and if they do bad things, they’ll pay eventually. I think you have to be very careful in there because the Erlang VM has a few limitations requirements that you have to follow. For the rest, we have everything else to work.

Kresten Krab Thorup: Yes, like not running for too long.

Robert Virding: Not running too long, not doing destructive updates of memory. NIFS will allow you do that if you want to, but the rest of Erlang VM won't support that and there is no saying what will happen if you do it. The worst thing that could happen is sometimes it will work and sometimes it won’t and it will just do strange things.

4. Communicating with the outside world with Erlang ports

Kresten Krab Thorup: In the original libraries, for instance I’ve been looking at the TCP/IP; I’m very fascinated about the whole interaction TCP/IP - asynchronous IO. Is that something you worked on?

Robert Virding: I didn’t work on the TCP libraries. That was Klaas Wiegstrom that did most of those. But it’s very nice you can get the asynchronous view of the world which both fits into the Erlang view of the world and also into TCP view of the world as well.

Kresten Krab Thorup: But that’s one piece of code that should have been written in Erlang.

Robert Virding: Probably. I quite agree. I was used to be much more pure trying to write everything as much as possible in Erlang. For example that was one reason we had ports as processes. That would mean that someone using a port would just send a message and wouldn’t really know where the message work was done. So it could have been done in Erlang, it could have been a chain of processes or it could have been externally or a little bit of both and you wouldn’t see that.

Kresten Krab Thorup: Erlang has this notion of a port, which is almost like a process, but it has something external attached to it. You can send messages and do things with that external entity.

Robert Virding: Yes, and then hopefully send messages back for communication. But also the error handling mechanisms work as well. When you create a port, you’ll link to the port. If something goes wrong with the process, you will close down the port and if something goes wrong with the port, you’ll close down the process. All the basic concurrency mechanisms work on ports as well.

One way of looking at, the only difference is you can test whether it’s a port or a process. Actually, when we started developing ports we didn’t really know that if we were going to allow us to see the difference or not. We did eventually.

Kresten Krab Thorup: I think that should have been hidden.

Robert Virding: I think that too. I think definitely it should have been hidden, I think you shouldn’t have seen it.

5. Throughput vs Latency in Erlang VMs

Kresten Krab Thorup: We should create a framework for user level ports where you could just write them in Erlang and get those hooks. I’ve been working on another personal machine and I’ve done a lot of work in web servers and various service systems where time is not as critical. There it's more about throughput. That’s the major tradeoff that I make in part, obviously because I’m basing that on Java is I’m letting go on some of those real time requirements and then pushing for throughput. I think as Erlang becomes more and more popular don’t you think that’ll grow out of that real time?

Robert Virding: It probably will. The design was driven by the requirements through which our problems were looked at. The realtime was one of the design criteria we had. If you are running a different type of application you get different requirements and very well you can have different Erlang implementations that look at different things.

Kresten Krab Thorup: Could you imagine, a VM where you can kind of turn some knobs and say "This part of the system is throughput specific or important and this part of the system is real time important?

Robert Virding: Yes, but I wonder if you can mix them. Because that means that unless you have them in a separate system, your throughput system is going to affect the real time system as well or the other way around. If you are going to have these real time requirements, they are going to affect how the throughput works. The throughput can’t break those if they are hard requirements. You might have to have separate nodes doing things.

Personally I think it would be very interesting to see that when you take these 2 VMs with different properties and they start running big real applications on them what the results will be. That’s one of the things I found when I did my alternative implementations, that in the end they are pretty much equivalent in speed of running applications where some things are better on one than on the other.

Kresten Krab Thorup: You get a kind of fluctuating effects when things are under pressure.

Robert Virding: It costs you one thing, you have to pay for something else. I mean you get something else. That is what I was hoping with having a single heap, that you get the best of both worlds. I think the mainly the garbage collection costs were higher. The garbage collection became more inefficient.

Kresten Krab Thorup: Do you think we’ll see other languages coming out with this kind of scheme? Do you think Erlang as such will push for then be another real mainstream language or do you think we need another step in that evolution before these concepts become mainstream?

Robert Virding: I don’t know. I think, if you look, it seems like the requirements or what type of things that languages like Erlang can give you, people want for applications, it might be difficult to try and convince them how to get it to choose a different language with these requirements, especially a language like Erlang which is in many ways quite different from other languages or from Java. Hopefully, they will pick languages which have these properties because they'll need them. It might be Erlang, but it might be another language that comes along which takes some of these features and implements them.

Kresten Krab Thorup: Now you are not doing telephony systems any more, but you are still doing the Erlang stuff. Are you using the realtime things? Is that important for what you are doing?

Robert Virding: For what I’m doing, no. I’ve been implementing a few languages on top of Erlang, that’s what I’ve been doing. I’ve had the Erlang properties, they're there, I've used them, I accepted they're there, I'm not trying to work around anymore, get passed it, they are just there and they are not what I’ve been looking at. In that sense, I’m a pre-sequential Erlang user, but nothing I’ve done would break the Erlang concurrency model as such.

6. Languages on the Erlang VM

Kresten Krab Thorup: There are several languages now running on top of the Erlang VM. What are the general properties of those?

Robert Virding: There are 3 I know of. There is one called Efene which presents a more classic syntax on top of Erlang and as far as I know it’s basically Erlang, just the syntax is different. There is one called Reia, which gives a more object oriented model on top of Erlang. It uses Erlang but it has the concept about objects and sending messages to objects and classes and things like this. Then there is my own Lisp for Erlang, which gives you a Lisp syntax on top of Erlang and also things like Lisp macros and all the other goodies Lisp uses. But they all use the Erlang VM and at least LFE and Efene as far as I know don’t really change the Erlang model.

Kresten Krab Thorup: That means you can integrate it very easily with existing code.

Robert Virding: Yes, that’s one of the goals of LFE that you could write standard code in LFE and use it in Erlang applications. Everything else is compatible.

Kresten Krab Thorup: Is there anything else you would add to the core language after all these years?

Robert Virding: Variable scoping. That wasn’t really conscious decision; it just grew out from the Prolog background. Some people like it, I don’t. I would remove it if I could or add a more traditional type of scoping, variable scoping. It will also allow you to make it easier to do things like defining local functions and stuff like this.

Kresten Krab Thorup: How is variable scoping special?

Robert Virding: A variable is defined for a whole function and once it gets a value, it has that value all the time, you can’t change it. There is only one version of variable. There is no scope for it. If you use it somewhere it will have the same value. If you define it, you always have to make sure that you always give it a value everywhere, because otherwise it might get to a case where it suddenly has no value and things like this. You can get some weird cases where you are defining variable values.

Most people don’t because they write pretty kind nice code but you can do really strange things with defining variable values and it can be very difficult to work at actually where the value comes from.

Kresten Krab Thorup: I had a strange bug yesterday introducing a new parameter to a function and all of a sudden that had the same name as the variable and then something didn’t match.

Robert Virding: Then something didn’t match and the compiler won’t complain because you haven’t done anything wrong, you’ll just find it when you run it doesn’t match. That’s why for example you get various errors from doing cases. That’s a typical problem you get that you define a variable local to one case clause, in a couple of them, as long as it's not defined in all of them the value is not exported then suddenly you define it in the last one and the value is exported and then clashes with something else.

Kresten Krab Thorup: All these things are just artifacts of the compiler.

Robert Virding: Yes. There is no problem internally. It’s all transformed into a normally scoped language internally. It’s just the top level. It's pretty easy to change, because of course people wouldn’t like the syntax change in actual work , you would probably need a few days work to fix that and then recompile everything. That would be one thing. Personally I don’t have problems with syntax as it is.

Kresten Krab Thorup: Do you get used to it?

Robert Virding: Yes, you get used to it after a while. I think most people get used to it. You read cases of people that say "In the beginning it was a bit strange, but after a few weeks I didn’t really think about it anymore". Probably it’s a little bit strange, but it’s quite succinct and does what it’s supposed to. Then I like the simple language so I would make sure to keep it simple and be restrictive of what you add to it. I wouldn’t go as far as to say that every time you add something to your language you should remove something, nothing else but that'd be restrictive too.

Otherwise I’m very much against special cases. There is always a special case and there is always someone who can justify exactly why you want that special case and they can give you very good reason for it. But once you start adding special cases, they just get more and more and this complicates the whole thing both for the implementation but also for the users, because there are more and more special cases. I’m very much against special cases.

Kresten Krab Thorup: The Virtual Machine has lots of special cases.

Robert Virding: The Virtual Machine has lots of special cases, yes, but hopefully we don’t see them. The Virtual Machine does a lot of strange things internally. It’s gotten quite complex. One feature with the JAM was that the Virtual Machine was very simple, it was quite comprehensible. It’s gotten more and more stuff in.

Once you start optimizing for speed it becomes more complex. One thing I think would be interesting to do was if you did your own Erlang implementation for some sort of specialized hardware or specialized application.If you want to optimize something, for example optimize the emulator size or the instruction or the code volume or things like this could be an interesting specialized implementation. It’s a bit of work and it’s not something I would do without having a goal.

Kresten Krab Thorup: With a simpler core language and more stuff written in Erlang as opposed to C, it would be much easier to build new implementations.

Robert Virding: that’s partially done today. in the compiler there is something called core Erlang, which is a much simpler functional language, which is probably much easier to implement, make a simpler implementation of anyway. That’s just the compiler part of the language, it’s not the whole runtime part, which can get quite difficult. I don't know how much you see when you are doing Erjang.

Kresten Krab Thorup: I just see the BEAM code, the compiler is a program to run.

Robert Virding: But you still have to implement the whole Erlang runtime.

Kresten Krab Thorup: The whole Erlang runtime and all the BIFs and all the special cases and semantics of those BIFs. The big hairy part in the middle of this is the TCPIP driver. It’s a huge state machine.

Robert Virding: I think people have done 2 test implementations of TCP/IP on top of the IP in Erlang. For what I’ve understood, to do a naïve implementation is relatively simple, but if you want an efficient implementation it’s quite difficult. That’s hopefully what the C one has done. One thing that’s missing today is actually a proper definition of what Erlang is. There was one 15 years ago but that died.

Kresten Krab Thorup: I guess it more and more complicated. One thing is what the Erlang core language is, but you can’t really do anything without a pretty significant set of core libraries.

Robert Virding: If you could define what the core language is and maybe a few core libraries - very few - then you basically take everything else and put on top of that. Therefore, porting it would be easier, once you get the core out. Then, of course the more you have written in Erlang in that case it’s easier. Sometimes I don’t know how much faster it is to implement in C than in Erlang. Some things I imagine would be, but it would depend.

7. The origins of Erlang

Kresten Krab Thorup: It must be interesting to have been part of creating something like this and all of a sudden seeing it explode.

Robert Virding: Originally it was when we started designing a language to try to meet their requirements and I think it’s been fun to see that going from something which was designed to control telephony hardware to completely different application domains. And it works, people can use it. That’s been very interesting and fun to do that. It shows that some of the basic criteria we had in requirements were more general than we thought they were.

There is the concurrency, the message passing, the error handling, they are more general properties than we thought there were. I think it’s a lot of fun, that’s why I like coming to these conferences, to see what people do with it. I’m really surprised it’s gone so far. When we started the web hardly existed so it just wasn’t something that we considered. I’m truly impressed with what they do.

Kresten Krab Thorup: We are all impressed by your work, too.

Robert Virding: I’m impressed with what they managed to do with it and how they use it. Erjang the movie. We expected the movie, which wasn’t our idea, but we were actually quite serious when we did it.

Kresten Krab Thorup: You all definitely looked very serious.

Robert Virding: We were serious in the message we were trying to bring over. We believed in it and it did actually work. All the demos that were shown in the film actually worked, so they were true runnings of the system.

Kresten Krab Thorup: Who was it shown to, originally?

Robert Virding: I think, if I remember rightly that there was going to be a great trade show and we were going to present Erlang, so we had an exhibition - that’s what the train set was for. We made a number of Erlang presentations and got help in making good presentations, which is difficult and someone thought of making a movie as well to advertise it. We were very serious about the message, but we didn’t really have to say much about the actual movie itself. I think someone called it "Monty Python with technicians" or something like that.

Kresten Krab Thorup: You were sitting there, reading. It’s obvious there was a piece of paper out there and you were trying to read it.

Robert Virding: I was sitting there trying to point to the panel or something on the screen. It was fun doing it. It was very different to do that and you can feel for what you do in the film or not.

8. Erjang - Erlang on the JVM

Robert Virding:But Erjang - it’s of the things I’m interested in for years. Some people have asked "Wouldn’t it be better if Erlang ran on the JVM? Because everyone’s got the JVM and JVM is fantastic?" which in many ways it is. Why not run it on the JVM and people said "Probably it won’t work, it won’t be very good because the Erlang requires this and this and the JVM doesn’t have that, mainly concurrency."

As far as I know, no real test has been done. There have been a number of Erlang-like concurrency packages written for the JVM or for Java and the JVM or for other languages and things like this, but no serious test has been done, as far as I know. I’m honestly very interested to see the results because if it goes well, if you understand how far it got now, you’d better run whole applications, whole problems on it to test the measure and to get the properties of it.

That I think would be interesting to see. It might be we were right, it might be we were wrong. It might be as the experience from doing other Erlang implementations that it pretty much evens out.

Kresten Krab Thorup: Yes, you get different behaviors, different tradeoffs.

Robert Virding: From what I can gather, sequential Erlang code will run faster on the JVM.

Kresten Krab Thorup: Sequential code will run faster. Yes, that’s what it looks like right now. Even concurrent code if it is run with one scheduler. That’s one of my challenges right now - 2 equally pushing processes onto other schedules. It just picks any scheduler that is not doing anything and just puts the job in there where some of the experiences from the Erlang SMP say it’s good to have a strategy to keep processes on the same scheduler. Doing something like that will improve those things quite a lot.

Robert Virding: That’s one thing you can get when describing the work on the SMP is that getting the schedulers working well it’s been difficult.

Kresten Krab Thorup: That can definitely leverage those experiences.

Robert Virding: They've worked a lot with that and found that the changes are very noticeable, especially when you get more cores. The more cores you have, the more you notice that you get the scheduler with them right or wrong - you see it much more.

Kresten Krab Thorup: That’s one thing. The other thing is what happens if something goes wrong. That’s one of the places where I’m a little worried in Erjang. I obviously have process isolation from the Erlang program perspective, but in reality there is always this thing that one process can use a lot of memory. They go wild in various ways. What happens in those edge cases? Maybe it’s easier to control in Erlang, but still you hit the hard limits of various kinds.

Robert Virding: That’s one benefit of having the separate process heaps is that you can set limitations on them. I can't remember if they do it, but you could set process size limitations, whereas if you have a single heap you can’t do that.

Kresten Krab Thorup: The same is stack sizes. One of my issues is if loading bytecode fails - it’s the error handler. If it fails to load the bytecode it uses the standard pretty printing to print out the entire binary. That particular pretty printing is a normal recursive of function, a non-tail-recursive function because one iteration for every byte and that creates a hugely deep stack. When I get a little bytecode I a have an issue.

Robert Virding: You need to print the bytecode.

Kresten Krab Thorup: This is just the code in the module called error handler. If calling the BIF load module fails, then it will print the bytecode. Maybe we should go fix that.

Robert Virding: There is a lot of work you can do in the core things. A lot of things there have never been documented and you can do a lot of fun things in there as well, which probably you shouldn’t but you can. One of the things that we’re interested about is that you are running on top of the JVM, how are you going to interact with other things? With Java code? How are you going to enforce the cleanliness between the 2? I assume you are going to do it.

9. Erjang Java Integration

Kresten Krab Thorup: Interacting with other Erlang programs is obviously just with Erlang distribution. But talking to Java code you’ll go through something like NIFs. You write essentially a piece of wrapper code in Java where you put some annotations in your code, you write your code in a special way where there are some naming conventions, there are some module correspondences.

Erjang will then see those, these special names and pick them up and turn them into bits. Essentially it’s very easy to write a BIF or a NIF are exactly the same. It’s just a static Java function that follows certain naming conventions. There is a whole range of different JVM languages and some of them excel in a very natural flow of values across language boundaries. That comes with a cost, obviously and it’s very easy to break the semantics of either side.

Erjang is more strict in that there are specific set of types where you can only access the data types immutably, so you get an ETuple. You don’t have access to mutate the fields. There is the API for the values that we expose that way. Just as you have linked-in drivers and NIFs, etc. they can do weird things.

Robert Virding: I was thinking there is a problem with the data security and enforcing immutability. It would be natural for someone on the outside to try and mutate the data and then use that as some sort of back door communication mechanism.

Kresten Krab Thorup: Java has the security model that allows you within the same program to compartmentize and set security rules. That involves the rules you have to follow are the rules that apply everywhere up the stack. You can’t call in through a hook and then out again and that will give you some privileges. That might be one of the interesting applications that you can compartmentize and set security boundaries for what you can do where.

But when you want to enable that, it also comes at a cost. Something that has been touted about Java and everybody who’s selling Java related things are talking about these security things and in practice nobody is using them. They turn them off, because of performance. That applies for web servers and everything.

Robert Virding: That’s always a problem if you are interfacing with a language with immutable data and sending data backwards and forwards.

Kresten Krab Thorup: I’ve been looking into running Erjang as an applet essentially, running it in the Java sandbox that allows you to run it on people’s desktop.

Robert Virding: A pretty big applet, isn’t it?

Kresten Krab Thorup: Including a minimal OTP it’s 6-7MB or something like that.

Robert Virding: You asked before what are the things I’d like to change about Erlang. It’s probably the wrong place to put it, but one of the things I think is wrong with OTP today is not so much the functionality but the structuring, in that there is no structuring. It’s one big block that you get. You start up and you load in OTP and suddenly you load in 30-40 modules which are just there to get all the OTP stuff running.

I think that’s wrong. I think you should be able to make a core system underneath OTP of the really basic stuff just enough to run the thing and then you load the OTP on top of that and then you load stuff on top of the OTP. You are getting a better layering of the whole system.

Kresten Krab Thorup: Layering a distribution of modules and managing the versions of them - that’s right. One of the things I think could be an interesting application of Erjang is actually being able to package up an entire release in one file. You just double-click on it and then you have the application running with everything in it.

Robert Virding: I’m really interested to see the results hereabout, push it throughout the applications. Is Erjang thread safe, if you are running it for example together with other threads, other Java applications are they safe?

Kresten Krab Thorup: Calling into Erjang, you would almost always have to do that by sending a message. There is a public message API to somehow get hold of a process identifier and you can send messages to it. Getting in that way, getting out the other way you can call through a NIFs essentially. But because of the calling sequences and stuff it’s not easy to call directly into Erlang.

Robert Virding: What about ports and port mechanisms? You have that but what’s on the other side from the Erlang point of view, the external side of a port? Is it another operating system process or is it a thread or what is it?

Kresten Krab Thorup: In my world view it’s just a process, very much like an Erlang process. But it calls out to a well-defined callback interface of NIFs. It packages up these 10 NIFs modeled very much after the driver API. There is output and command and control. I have the same set of APIs for doing selects. It looks like a "C style" select.

Because the Java driver API is modeled closely after the C version, I can almost line by line take the code from the TCP driver which is line by line translation because it's so complicated? It is just a Java object that implements a given interface with 10 functions or something like that at the other end of the driver.

Robert Virding: How many threads are you using?

Kresten Krab Thorup: Just one. It runs best on one thread. For certain things, I don’t have asynchronous APIs. For instance, if you spawn an external process.

Robert Virding: What do you mean "external" in this case?

Kresten Krab Thorup: Another operating system process. For instance, when you run oscmd, run a shell command for interacting with that at the Java level APIs are interacting with an external process are only synchronous. For those I have a separate thread pool which I just grow in a separate space.

There is a separate pool of threads that I use for doing things like that. The same goes with the file IO, it is also only synchronous, just like the e-file driver spawns off asynchronous threads to do a read for instance or to do a sync or something like that, I do the same thing, but in a normal Java thread pool.

Then it comes back into the Erlang ecosystem and send a message back. That means it runs best on one thread for all the Erlang processes and then you need a pool of threads to do the asynchronous stuff.

Robert Virding: Don’t there exist asynchronous protocols or interfaces in Java or just no one has been interested in having them?

Kresten Krab Thorup: My reason for this entire exercise is getting this way of thinking, the Erlang way of thinking, practical actor programming under my skin, because I think that has a huge future. I like to say Erlang to me looks like Smalltalk looked in 1990. 5 years before hitting Mainstream, all the smart people were doing Smalltalk and they could really see the value of it and there was a lot of leverage and good things.

Then, 5 years later, Java hit, another mainstream language. I think something that carries this Erlang DNA will be carried into a mainstream language in 5 years.

Robert Virding: We were almost always very careful to make sure that things were asynchronous. There were a few cases inside in there are the older things in the older system that aren’t truly asynchronous. For example, sending a message to a registered process. Message sending is asynchronous, but looking up the registered process is synchronous. That was quite difficult to port it and make it distributed because they had to get the same functionality which we wanted.

Finding out if there wasn’t a registered process on another node in a synchronous way was not easy to do. That was in a few cases. Otherwise, we were very careful to be asynchronous everywhere we defined it asynchronous. Just a reflection of that may be why it works so well in a web application. That is very asynchronous, really.

Kresten Krab Thorup: We’re seeing that in the products and solutions that we do for our customers. I work in a software company. More and more systems are being integrated in the more and more distributed fashion. We’re doing some healthcare systems where we’re integrating more than 50 separate systems. We don’t control them. There is always one of them down, we are just having this whole problem space of having to talk to remote servers that are not necessarily there. There are so many things in that space that all of a sudden are interesting.

Robert Virding: Apart from personal interest, do you see there is a case for using Erjang? I can quite understand you are doing it for personal interest, I would do that, there is no problem, but when it’s done will that solve one complaint people have on Erlang that it doesn’t run on the JVM? Do you think people can accept there is a different language, even though it runs on the JVM? I know there are other languages on the JVM.

It’s not the first time it’s not on Java, but from what I understand in many ways it’s still quite different from the other languages. Do you think people can accept it? Yes, you can show it works and you can show that you can write that type of application very well on it, but is that enough?

Kresten Krab Thorup: If it runs better, then it would be very attractive for all Erlang users.

Robert Virding: Say for non-Erlang users.

Kresten Krab Thorup: For non-Erlang users it’s as much a mental thing as it’s a real thing, but I think it will make the transition publicizing the Erlang ideas easier because Java people will listen to it all of a sudden, it’s within that scope of possibilities. Now you can all of a sudden you are in a big bank or something and the only way you can deploy anything to production is by putting it on some of these IBM systems that only run Java.

Then it comes into a space of the possible from being a weird native thing that runs best on Linux and other strange things, you have to compile it and wave a wand over it as you compile it and see if it works. All of a sudden it’s something that comes into their comfort zone. It will just be a big JAR file and you can plug it into your IBM WebSphere and it will run. I see that as a way of promoting Erlang.

Robert Virding: I have Java on my phone, but I don’t have Erlang. I’m waiting.

Kresten Krab Thorup: That will be another few weeks.

Robert Virding: That also gets back to what we talked earlier about the size of the system and if it’s an open system.

Kresten Krab Thorup: With the design tradeoffs and the stuff that I’m working with it’s a server side thing. It’s not really designed for running in small things, so you’ll see longer startup. It is a server style thing where you tradeoff startup.

Robert Virding: The same it is with most Erlang applications today. You can’t get round the sides for 2 people whatever you do. You can’t do much about it.

Kresten Krab Thorup: BEAM loads OTP a lot faster than I do.

Robert Virding: But you do quite a lot of work when you’re loading, don’t you? Because you’re loading BEAM code literally.

Kresten Krab Thorup: Yes. Literally and translating it into Java bytecode and then loading that. The JVM does a lot more; it does complete validation. Once the JVM gets the bytecode it’ll do a type inference and a validation that it can do no bad. Actually loading of the bytecode itself is a significant part. One thing is compiling, I have a cache for the BEAM to Java bytecode.

I’ll just cache it in some files and then if they checksum matches, I’ll just use that. If the checksum of the BEAM file matches, then I’ll use the same one, actually taking Java bytecode and actually loading it into the VM and all the validation that the VM does. You can easily construct some BEAM code that’s hostile.

Sep 24, 2010

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Kresten Krab Thorup, Robert Virding Discuss the Erlang VM

Bio

About the conference

This content is in the Erjang topic

Related Topics:

Sponsored Content

Related Editorial

Related Sponsored Content

Popular across InfoQ