InfoQ Homepage Presentations Pony, Actors, Causality, Types, and Garbage Collection

Pony, Actors, Causality, Types, and Garbage Collection

View Presentation

Speed:

Download

50:51

Summary

Sophia Drossopoulou gives an overview of Pony’s programming model, actors, and causality. She introduces the type system, how it is used to allow actors to send mutable state while also avoiding data races, and how the type system is used so as to allow the actors to perform garbage collection fully concurrently with one another and with normal execution.

Bio

Sophia Drossopoulou is a Professor at Imperial College with research interests in the design and implementation of programming languages, and in program verification. She was part of the original Pony team.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Drossopoulou: Eight years ago, Sylvan Clebsch, came to Imperial College with a vision about a new language that would make a better job of concurrency and concurrent programming. I thought, there is nothing new to find there. It has all been invented. I was wrong. The following was created around this idea. Sebastian Blessing suggested that we should implement the language. He went immediately and worked on the distributed version of the language. We implemented the language. For any serious programming language, you have got a page with all the material that exists. You have got a GitHub page with all the versions. By now there is a community, which also controls the language development. You can also find the examples that I'm going to use at this location. There is a playground for you to try things out.

Pony - Goals and How

What I want to discuss are the main ideas that drove the language. The goals for this programming language was that it should be concurrent and also support distributed programming. It should have a very efficient implementation. It should be easy to write correct programs. What do we mean by concurrent? There should be one driving paradigm, and the driving paradigm is the actor paradigm. Efficient. There shouldn't be locks. You shouldn't need to use any locks. The runtime should also not be using locks. You should be able to share state without making copies. It's easy to write. It should be data race-free, data lock-free by construction. It should be safe for you to create object cycles as opposed to Rust. There should be some simplicity in the model when you program. You have got this figment of atomicity in your mind, and causality.

Pony vs. Erlang vs. Java

Is the language indeed efficient? Four years after our first excitement there, and after we started the project, we tried to run benchmarks in order to find out in particular, how does the language compare with respect to garbage collection? We created a couple of very basic micro benchmarks. Here, we are reporting how Pony behaves. It says Orca. The reason it says Orca is because the garbage collector is called Orca with respect to Erlang and with respect to Java. In Java, we're using C4, which is the best garbage collector there is. The results are tremendous. We were super happy, because we were faster with any of these. We have been honest with the benchmarks. We just wrote small benchmarks, and we compared and found these wonderful results. We were extremely happy about that. Here, we have been measuring the performance for 4, 8, 16, 32, 64 cores. Here, we have been measuring footprint. In terms of footprint, there is one example where Pony performs awfully. We have not been able to devote as much time, as we would have liked, to benchmarking. Then, we revisited the whole question a couple of years later, and we have got a paper in AGERE from this October.

Pony vs. Akka vs. CAF

We decided that we should be looking at a benchmark suite that has been developed by others, not us. The benchmark suite that we used is Savina. Savina has been proposed as a good benchmark suite to compare actor based languages. In these benchmark suites, there are 32 programs, and they are already implemented in Akka. We have asked our friends to implement them in CAF, and we have compared. Here is one example from the parallel benchmarks. Pony is the green guy and it doesn't do very well. Akka is the red guy and does best. CAF is the blue guy and does pretty good too. Then in the Sieve of Eratosthenes, Pony does much better. Then after we had 16 programs and we didn't know what to do, we computed the average, and Pony is not doing super good with a low number of cores, and is getting better. On the other hand, for the concurrent story, Pony is doing pretty much here. For this example, it is not doing as well. Overall, it is doing better than everybody else. The jury is out. These are all micro benchmarks. We are now trying to develop a benchmark to rule them all where you can tune and get more interesting results. This is work in progress. This is what we know about efficiency. It is encouraging and it is not final yet. However, Wallaroo Labs decided to choose us because of efficiency. They're a major contributor to the Pony endeavor. They have developed their own distributed version of Pony.

Features of Pony

Pony, as any serious programming language, has several interesting features. What I want to discuss is actors, causality, some aspects of the type system, and how the type system has been used for garbage collection.

The Actor Paradigm in Pony

Who knows about actors? Asynchronous messages are called behaviors, or rather, the response to the asynchronous messages. In Pony, as opposed to Erlang, the actors take the first message out of the queue. We have also got synchronous messages. Pony is an extension of the object oriented paradigm. Here is an example of an actor in Pony. It has got two fields: name and environment. This is the constructor. It has got the behavior when it gets poked, then it asks the environment to print its name. Here is the main program that creates three actors and then pokes them. Execution could look as follows. The main program creates the three actors. Then, let's look at what happens when we poke. When the actor is poked, then it asks the environment to print its name. The main program sends the message as poke. Those behaviors send print messages to the environment. It is possible for the print message from A to arrive last, even though it was sent first. It is also possible for the print message to arrive first even though here it arrives last. It is possible for poke to arrive here, but for the behavior not to be scheduled until later. It's all concurrent programming. Finally, it is possible to have times where the actor is idle and doesn't execute any behavior.

Uncertainty

The next point is causality. The reason for this is that, now, we have seen there is some element of uncertainty in the program. For instance, here we have got four actors. I indicate the actor through this rounded square. That is the name of the actor and that is the thread of the actor. These are the times when it is executing a behavior when it's active. What do the other actors do while I am executing my own behavior? Do they go and they modify the world and then I will be affected by that? When will the messages arrive and when will they be taken off the queues? This uncertainty is alleviated through types and through causal message delivery. What do types do? What they do is they give you the following guarantee. They say when a message is taken off the queue, any changes to the world will be invisible to you. The other actors may execute and modify the world, but it's not the part of the world that you can see. This guarantee holds, actually, upon message sent. As soon as the message is sent, anything that is in the world that is visible from that message is essentially unmodified until the receiving actor can start and work with it. The other reason that these uncertainties are alleviated is because we have causal message delivery. This goes beyond the guarantees that are given, for instance, in Erlang, or in Akka. There are different definitions of what causality means.

Message Delivery Example

Let's assume we have got a scenario where there is a customer who decides to go and buy stuff. Before they buy stuff, they need to make sure that they have got enough money in the bank. What do they do? There is a behavior run. They go and determine, what is the price of what they want to buy? They send a message to the bank saying, credit me with the price so that I'm sure I have got enough money. They send a message to the store that says, buy from me something with a particular price. Store and bank are fields in the customer. The store then knows about the bank, that is his own field. When the store receives the message buy, it asks the bank to debit the customer with a particular price. What does the bank do? It has got a table, a map from customers to their balances. When it gets created, it initializes the table. When it receives the credit message then it increments the balance for the particular customer. When he receives debit, it checks whether there is enough money. If there is not enough money, then it goes and complains at The Bank of England. Makes a big fuss, otherwise makes the payment. We are hoping that this big fuss will never happen. Why will it not happen? From a simplistic point of view, when the customer runs, it wants to be sure that credit will arrive at the bank before debit. We said that the guarantees that are made about the queues are weak. The fact that somebody sent something before somebody else is not necessarily a guarantee. Could it be that the debit overtakes credit, and then indeed The Bank of England is going to make a fuss?

Uncertainty Alleviated Through Causality

Luckily, the question is no. Why is it, no? The guarantee of causal delivery is the following. First, what does it mean for a message to cause another message? Either, I receive a message and then I send another message. If I receive message M, and then send M', then M is a cause for M'. If I send M and then send M', then M causes M'. Causality is transitive. What we have as a guarantee is that the message arrives at the same queue in causal order, not across different queues, because that is too distributed and difficult. At one queue, they all arrive in causal order. In particular, when we worry about this, we know that because the customer sends credit and then buy, then credit causes buy. Because the shop receives buy and then sends debit, buy causes debit. Therefore, credit causes debit. Therefore, credit is a cause of debit and therefore it arrives first field, no worries there.

Causality and Distribution

Is that an expensive feature to have? It is dirt cheap if we're working on one node. What about distributed programming? The guarantee that we have is that if we organize everything in a tree topology, if all the communication goes through a tree, then message delivery is causal. No matter how the customer, the bank, and the shop are organized in that tree, we know that we have causal message delivery.

Pony Types Reflect Execution

The other subject is how uncertainty is alleviated through types, and the type system. There is a paper in AGERE in 2015. I'm going to go briefly through what the type system offers. The very interesting thing with a Pony type system is that the types reflect execution. For every interesting operation that affects the topology in the heap, there is a corresponding operation in the type system. I have got the reference. I have got the path that gives me access to some objects. This path can be just a variable, or a variable and the field lookup, or a variable and another field lookup. I have gotten a reference to an object. What can I do with that? Can I read? Can I write? If I create a new alias to the reference, I create a new path that gets me to the same place where the old path gets me, what can I do with this new path? Can I do everything that I could do with the first path? Then, what do I do if I drop a reference? If I remove an alias to an object, can I now do more things? What if I read the field from my reference, I follow the path? What if I extract a field? I override the value of a field in an object, now the object that was the old value of the field has got one less path to it. What can I do with that? The answer is, what can I do with my reference? We have got the concept of reference capability which we call Kappa. If I alias my reference, then there is an operator on the reference capability, which is the exclamation mark. I have made one more reference to the thing. Unaliasing is the dash. I have removed the reference. If the reference had capability Kappa and the object has a capability Kappa' for the field, then the ensuing reference is this viewpoint adaptation. If I remove the reference, then it is the extracting adaptation. These are the operators. This is the backbone of the type system. Because I want to also talk about garbage collection, I'm going to concentrate on this.

Reference Capabilities

The reference capabilities are attached to a reference and they express whether the holder of that reference may read or write the object. They also express whether other aliases to the object may be written or read, and distinguishing, actually, whether the other references are local, or global aliases. I want to discuss what this local and global alias mean? Assume that we have an actor of type, this actor. It has got a field called other actor that points to some other actor. This is the diagram to understand that. Then, I also have got another field, holder. Holder is initialized so that it points to some object of class account. This gray blob is an object of class account. Then, I'm executing a function. This function declares a local variable called local alias. Local alias is assigned holder. The holder is one reference to the account and local alias is a local alias to the holder. It is a local alias because it comes from the same actor. Then I execute the behavior. I call the behavior take on the other actor, and I parse the holder. The other actor has also got a reference to the account. This is my current alias, the holder. This is the local alias and the global alias. The capability that I give to the holder needs to be so that it respects what the local alias can do and what global alias can do. The more this one can do, the less those two guys will do. If for instance, this one is allowed to take the account and give it away, then this one should not be able to read it. If this one is only allowed to read then all those should also be only allowed to read.

In Pony, we have six reference capabilities. I'm going to omit transition because it is a little bit of an edge case, and I want to get the main idea across. What rights do those capabilities give to the holder, the local alias, and the global alias? The iso capability and the ref capability say, the holder may read and write. The val and the box capabilities say, the holder may only read. The tag says, the holder cannot do anything. When Sylvan started thinking about these things, he only had this in mind, but because we set it up in a nice matrix, we realized that there is one more, which is the tag. Actually, this one more is very useful. The local aliases need to be somehow different for these two, and for these two, since the holder has identical rights. The iso capability says the local alias is not allowed to do anything, cannot read, cannot write. It can only know the identity of the thing. Whereas, the ref says the local alias might read, write, whatever they want. The val capability says the local alias is only allowed to read, while for box it says, I don't know. It's actually a union type. Tag, we don't know. The local alias might read, write. We know nothing. Since here, locally, at least one reference is able to write, then the global alias is not allowed to neither read nor write. Otherwise, we would have data races here. Since we want to be sure that we can read, then the global alias is also only allowed to read. Here, the global alias is also allowed to read. Which of those can I send across actors? I can send iso, because I know that I'm the unique one who can read, so if I give up my iso, and I give it to Justin, then I know that nobody's going to be in trouble. I can also give val because I know everybody is in agreement, nobody is going to read from it, so it can be shared across. I can give tag, because when I give it I give it as tag. I say, you are not allowed to look inside it. Tag comes up a lot when you have lookup tables, and things like that, when you only want to pass the identity to somebody else who has got the rest of the information locally.

The reference capabilities are attached to the references, which are paths, things like x, x.f, x.f.g when we're looking at more fields. They express whether the holder is allowed to read or write, and say something about the other objects. Also, note, that not only do we have capabilities for variables and arguments, we also have got capability for the receiver. Here, we are saying this function is modifying the receiver, therefore, it needs ref capability for the receiver.

Pony Code - Capabilities (Find the Type Errors)

We have got a person here who has got identity and strengths. When they eat, their strength increases. We saw that function already, by some amount that is taken from the food. The take_a_bite function from food decreases the calories and returns parts of the calories. Here in the main actor, we have got an apple variable. Then we have got a pear variable. They're both food. Laurie and Jan are people, and they eat the apple and the pear. If we look at the problems here, I am modifying the calories, and therefore the function take_a_bite should have a ref receiver. Here we are calling take_a_bite, which requires the food to be a ref but here it is a box. We need to change it. This was a short and fast description of what the type checker is doing.

Capabilities and Object Heap (excl. tag)

What do these capabilities really mean? What we have here is two actors, A1, A2. Here is the timeline of the actor, progress with time. Let's say this way. Here is the activation of the actor, forget about time. This is a snapshot. Here is actor A1, A2. Here is what the actor has access to. The actor A1 has got a local variable of type iso that points to that object. All these round guys here are objects 10, 12, 13, 20, and so on. I'm indicating through the black arrows, the references that come from one object to another object through each field. We can imagine that here, 10 has got some field, I don't care what it's called. This field has got a reference capability, and this field is pointing to 12. 12 has got another field that has got a reference capability that points to 13. 13, a box that points to 12, and also a ref that points to 10. This is a topology that is possible when I'm using those capabilities.

What does the topology mean? The isos introduce partition of the mutable heap into this joint region. These turquoise triangles indicate the objects that belong to the same region. We see that there are four regions, one starts at 10, the other at 20, the other at 30, the other at 50, because they have got all an iso pointing to them. There is one immutable region, and these are the blue objects. It's indicated through these light blue squares. These are introduced whenever I have got a val capability. From 12 to 10, I have got a val capability. I know that everything that is reachable from 100 is going to be immutable. All the objects here are immutable. Then with this approach, I can have my cycles, as I have got them here and I've got them here, which I couldn't have in Rust. Which are nice to have when you want them and you don't need to start worrying about unsafe programming. There are no incoming pointers into the mutable region. 21 cannot have any mutable reference to 12. It could have a tag reference, but I'm excluding it here in order to tell my story. 21 cannot have a mutable or even a box reference to 12 because it would go directly inside somebody else's region. 13 cannot point to 20 because 20 is an iso and there is only one distinct entry into a region.

Then, what we see is, there is at most one actor at a time that has got any access to the objects in a mutable region. Here, A1 has got access to the objects in that region and in that region. A2 has got access to the objects in that region and that region. A1 can go in there and create more objects, modify objects, change what they are pointing to. Perhaps even do garbage collection, and nobody else is going to be affected by that. A2 can do the same. They can both go inside the blue region and read stuff. Since all that is immutable, it's going to be safe. What we have is data race freedom by construction. The type system makes sure that we don't have any data races, and we don't need to log, we don't need barriers, we don't need anything. The type system does it all for us.

Capabilities and Safe Communication

How do we do communication? We want to be able to pass data around. Ideally, we would like also to be able to pass mutable data. There is this requirement, at most, one actor at a time has access to a mutable region. That means that the val reference can be sent safely to any other actor. It is totally safe to send the object 100 to actor A2, because it's an immutable object, and there are no restrictions about access to the immutable region. You can go and read anything you want, as often as you want. What about the mutable stuff? I can give up a reference and then I can send it to another actor. For instance, I might want to give up the reference that goes from 30 to 50. Then it is safe to give it to A1. I still have got my guarantee, at most one actor has got access to a mutable region. Similarly, I might want to give up my reference to 30 and pass it to A1. We have got the guarantee of data race freedom. What that means is we can share mutable state without copying and without any data races.

Guarantees of the Type System

What guarantees does the type system give me? It gives me no data races because at most one actor has got access to a mutable object at any time. Immutability is deep and permanent. As soon as something is in the immutable region, it remains there. Stuff can move from the mutable to the immutable by giving up capabilities and turning isos to val, and stuff like that. Everything that is in the immutable region, stays there, everything that is reachable from there is immutable. The capabilities become weaker with the distance. If I have got a long path that is allowed to write, then every shorter path of 8 is also allowed to write. We have got this figment of atomicity that says, if upon message receipt, an actor sees an object at certain capabilities that is a read or write, then if later on the contents of the object is different than what it was upon receipt, then it was I who made the change. It means I cannot observe anything that changes because of other actors, any change is due to me.

The Figment of Atomicity

The semantics that we have for behavior execution is interleaved. Let's say A1 is executing behavior. Then other actors execute, and then he continues with his behavior here. This is one behavior. A2, his behavior is broken into 3. A3 is broken into 3. I as a programmer, I don't need to know any about this, I can believe that all my behaviors are atomic. This one will finish on its own. This one will finish on its own, and so on. Or any other order but without the interleaving. That simplifies the programming model.

Garbage Collection

It came as a surprise to me how important garbage collection is. It is super important. There is Doug Lea, who is the master of concurrency and also garbage collection, talking about the issues that arise. In Pony, we have got a garbage collection protocol that is based on ownership and the reference count. We have Juliana, my PhD student found the wonderful name Orca, for it. What's so great about Orca? There are a couple of papers where you can read about Orca. Here, we describe how it works, and some of the benchmarking. Here, we describe the model that makes sure that it works. It was actually surprisingly difficult to make the model, and fun. Here, we describe another protocol that has to do with the collection of actors, and not the collection of objects.

Pony Garbage Collection Is Fully Concurrent

The wonderful thing about the garbage collection in Pony, is that it is fully concurrent. There is no synchronization, no locks, no barrier, no stop the world, no nothing. We have got the actors. This bit is behavior of an actor. Here is where the actor is idle. Here, behavior, and another behavior, and the actor is idle. The black blobs or the black squares is where the actor executes garbage collection. The interesting thing is that at this point in time, this actor is executing the behavior. He's doing garbage collection. He's doing behavior. It can be fully concurrent. This is what the model says. We wanted to have experimental evidence that indeed this is what is happening. Juliana instrumented execution on a 4-core machine. For each of those cores, she could record whether at some time the core was executing behavior, was executing sending or receiving messages, or executing garbage collection. The time goes like this. For instance, this part here is magnified here. This actor is doing behavior, receiving behavior, and then GC. While this actor is doing sending, receiving behavior. This is doing GC. It is indeed fully concurrent.

Garbage Collection and Concurrency Challenges

What are the challenges and what have we done about them? First challenge is, who collects the object? Second, how do we avoid data races between garbage collector and mutation, garbage collection and behavior? After all, the garbage collector needs to go and trace the object graph. It needs to find what is accessible from the frame or from the fields in an actor. The mutator goes and modifies that at the same time. The answer to who collects the object is the allocating actor. Whoever created an object is the owner and he is responsible for finding out when the object is no longer accessible and to remove it.

Given what I have been talking about, you know the answer to how to avoid data races between garbage collection and mutation, or between tracing and mutation? This is the type system. How does the owner know whether there are foreign reference to its own objects? How does the owner know whether other actors are interested or have a path to their object? Here, we have got a scheme where the owner keeps reference counts for all the objects that the other actors are interested in, and also, a reference count for the objects that he's interested in. When these things are modified, he goes and notifies the corresponding actor. He goes and modifies, means send messages. Here, we do have the problem of what happens if one message overtakes the other message. There are increments and decrements messages, and we need to know that the decrement messages do not overtake the increments, but we have got causal message delivery. It all works. We have got a case where there is a very tight connection between the language design and the runtime system. To my knowledge, this is totally novel. There were schemes already where the type system was helping the garbage collector, but they were not as intricate because all you knew is that there was some domination graph. As soon as you remove a dominating link, you can remove everything underneath. Here, we can have a much more fine-grained garbage collection. We don't rely on the regions anymore.

I'd like to outline roughly what is happening here. There are three actors. This actor owns 11, 12, and 13. He has created all these objects, but he doesn't have any links, any paths to those objects. For instance, actor 2 is interested in 11, 12, 20, 21, and 23. Therefore, those objects should not be garbage collectors. The challenge is that the owning actor might not have paths to its live objects, but the objects are still, nevertheless, live. There are cycles, potentially, in the object graph. Cycles are problematic with reference counting garbage collecting. The reference counting is the number of the actors who are interested in something, not the cycles, not the number of references within the heap.

Roughly, what happens is that the owning actor keeps another bound on the number of actors we have got paths to the owned object. It collects the object when this number is 0. The foreign actor, the actor that does not own the object keeps count of the reference to unowned objects. This actor A2, needs to remember that it's interested in 11, which it does not own. When the number of references to unowned object changes, then it needs to notify the owner. When an actor receives a message, it receives objects with a message. This number of references changes. It needs to tell the owner when it sends messages that it also needs to change. Also, when it does tracing, and it discovers that it's no longer interested in an object, it tells the owner. What we have is a system that is sound and complete, which means, eventually, we remove all the dead objects, and we only remove dead objects. Here is our evaluation with respect to Erlang and Java using C4. We have distinguished between the mutator time, the overhead of the mutator, the concurrent GC, and stop the world step, which works very well. The footprint is not that great.

Responsiveness

The other thing that we measured is responsiveness. We ran different benchmarks where we did a sequence of requests. The requests were some Fibonacci calculation, or something like that. We measured the difference in completion time between subsequent such requests. This is jitter, or we think it is. We saw that in the case of Orca, the jitter is very low, in Erlang, it's higher, and in C4, it's even worse. This is another indication that things are very encouraging.

We have got several features in the language. There is a large group of people who have contributed. Some of them were at Imperial College. There are several people outside and several people on GitHub. It has been a very interesting and rewarding journey for me. Please have a look at the language.

Questions and Answers

Participant 1: From an architecture point of view, how do you decide, what's an actor and what's an object?

Drossopoulou: An actor is something that can execute concurrently, whereas the object is passive. The actor is something active. I did not have that problem when writing code, of determining when something should be an actor. I have been writing some code in Verona, and there, I had the problem of when is something a resource. I don't know why I didn't have the problem.

I think the things that execute concurrently are actors. When we looked at the Savina benchmark suite, it was immediately clear what should be the actors, even without looking at the code written in the other languages.

Participant 2: I'd like to know a little bit more about project Verona.

Drossopoulou: That's a different talk. Ask me afterwards.

Participant 3: These look like really useful capabilities to have, how come you decided to put them into a new language rather than maybe into the library of an existing one? I'm thinking, particularly, of the time system here.

Drossopoulou: If you put them into a library, you would need to make sure that the whole language follows the library's type system. It's a very invasive change. Plus, then the runtime system relies on the capabilities.

Participant 4: I've tried a little bit of my own with the Pony language. One thing that keeps catching me out is the fact that garbage collection doesn't run during a loop. What's the reason for that?

Drossopoulou: It is totally an engineering reason. It is easier to trace when all you're interested in is the fields. You don't run garbage collection during a behavior, you only run it between behaviors. That means that you only need to trace from the fields of the actor. If you were to trace during a behavior, then you need to work the frames of the activation stack and find at which location is which object, of which type? If anything has been inlined, it's even more work. From the protocol point of view, nothing changes. From the engineering point of view, it's a lot more. We have come across that problem too.

Participant 5: You gave the impression that you were not expecting to build a community around a new language. Has it been a rewarding and interesting thing of its own to do that?

Drossopoulou: Yes. I was not expecting it to happen because when Sylvan came, I thought, there is nothing much new to discover. There was. It has been wonderful to get in touch with so many people who were interested in it. My own students worked on parts of the compiler. We then experimented with other features, for instance, how do we do object initialization in Verona, path dependent types? All sorts of things that theoretically are interesting to somebody else, thus now, they become super urgent because they have to be in our language. It's a great thing to be doing.

See more presentations with transcripts

Recorded at:

Aug 30, 2020

Sophia Drossopoulou

InfoQ Software Architects' Newsletter