Using the Actor-model Language Pony for FinTech
During his opening Keynote at QCon London on Monday morning Adrian Colyer mentioned the Pony Language:
We're very familiar with the fact that databases and distributed systems have had a lot of influence on each other over the last 5+ years. Now we're starting to see some really interesting work on programming languages that fit into that world. See for example Pony, which came out of Imperial College in London and is really fascinating stuff.
We were fortunate enough to have the designer of the language, Sylvan Clebsch, giving a talk on the native languages track on the Wednesday. Clebsch suggested that Pony is a natural fit for FinTech systems since "...in FinTech we don’t write software, we write time-dependent event stream processors that are performance critical but not formally verified". Mostly these are written in Java and C++ though other languages are used including Scala, C, OCaml, Erlang, R and NumPY.
Pony is an actor-model capabilities-secure native language that is compiled ahead of time using LLVM. The actor model, probably best known from Erlang and, more recently Akka, came from work done by Carl Hewitt and others starting with a paper in 1973. An actor combines state-management with asynchronous methods. In addition to fields, an actor has a single message queue and its own heap. In Pony, Clebsch stated, actor heaps are independently garbage collected and, unlike in Erlang or Akka, the actors themselves are also garbage collected so you don’t need to use something like a poison pill message to kill them; in essence there is no manual memory management.
Actors garbage collect their own heaps independently of other actors using a mark-and-don’t-sweep algorithm. This means that Pony is O(n) on the reachable graph; unreachable memory has no impact. The actor heap GC has no safepoints, no read or write barriers, no card table marking, and no compacting. Since it doesn't need compaction it has no need for pointer fix-ups.
This means that a pass through a single actor to collect that local heap just means tracing the reachable graph. There is no other associated work of any kind. This means that the amount of Jitter when an actor GCs its own working set is actually quite low. Not only that but it does it before it handles a behaviour.
Pony actors have no blocking constructs. They are cheap, having an overhead of 240 bytes when compared to an object, or 156 bytes on a 32-bit architecture such as ARM. They also have no CPU overhead when they are not executing working code. As Clebsch put it, “If an actor has no work to perform, it’s not even in a queue anywhere. The runtime has no knowledge of it, of any kind, unless is has pending work to perform”.
Actors pass messages around using message queues which are intrusive, that is messages do not need to be in more than one queue. More controversially the queues are also unbounded since "if the queue was bounded then, when the queue is full, you have to either block or fail,” Clebsch stated. Blocking can introduce deadlocks, whilst failing would require application-specific error handling every time a message is sent. Bounded queues are used to avoid the back pressure problem but, Clebsch argued, whilst unbounded queues move the back pressure problem they don’t make it worse. At the time of writing the Pony runtime does not do anything to provide generalised back pressure. Clebsch told InfoQ
That's not the end of the world: it's pretty easy to write domain-specific back pressure, such as the back pressure in TCPListener, which stops accepting new connections when the open connection count exceeds a specified number.
In the next couple of months, generalised back pressure will land in the runtime. What this does is automatically deprioritise actors that send to "loaded queues".
Fundamentally the actor model is about expressing concurrency, and dealing with hard concurrency problems is the main area for which Pony has been designed. Key to that design is the type system which is data-race free, concurrency-aware and proven sound. According to Clebsch there are no other languages with mutability and a data-race free type system, though Rust achieves the same thing through a combination of its type system and atomic reference counting.
Pony has no null. The type system is built on algebraic data types so, in that sense, it can be considered a functional language. The following example, from the talk slide deck, shows some code to create an order on a very basic order management system.
The ReadSeq[OrderObserver] iso introduces us to one of the most important, and novel, concepts in the type system. Iso (Isolated) is a reference capability which offers a guarantee that is built on deny properties. It is these reference capabilities (rcaps) which make the type system data-race free.
“It's not what you are allowed to do, it's what their existence proves cannot exist anywhere else in your program statically. So isolated says it denies both local and global aliases which can either read from or write to the object. That’s an incredibly powerful deny guarantee. It means that the most anyone other than you can know about this mutable sequence is its address. They can’t read its field or write to its field. That means it is safe to send it to a new actor even though it remains mutable without locks of any kind.” Clebsch said.
Rcaps are type annotations that indicate a level of isolation or immutability:
x: Foo iso // An isolated Foo
x: Foo val // A globally immutable Foo
x: Foo ref // A mutable Foo
x: Foo box // A locally immutable Foo (like C++ const)
x: Foo tag // An opaque Foo
It’s important to note that data-race freedom using rcaps is handled by the compiler during type checking, which means that there isn't non-linear growth in the amount of compiler work to be done as your code base grows. Colyer provides a fantastic summary of the paper that describes this in more detail on his Morning Paper blog.
Rcaps allow isolated (iso), immutable (val), and opaque (tag) objects to be passed by reference between actors, so you need some way of preventing premature collection of objects in messages (where no actor might have a reference) or where they are reachable by other actors. Pony uses a message protocol for this which is described in a paper which has also been written up by Colyer. The approach is analogous to a consensus algorithm, and Colyer draws parallels with the Chandy-Lamport distributed snapshot algorithm. The Pony paper, "Ownership and Reference Counting Based Garbage Collection in the Actor World" – Clebsch et al. 2015, states
When an actor sends, receives, or drops a reference to an object it does not own, it sends protocol-specific messages to the owner. These protocol-specific messages result in the owner updating its (local) reference count.
Pony is still in very early stages, and some significant items, including reflection and hot code loading, are non-trivial and not yet resolved. That said, whilst a recent survey suggested that the vast majority of users are still just checking out the language, but some are further along. For example Sendence, a NY company, has a FinTech product that they are planning to put into production soon.
Pony is an open-source language and contributions are welcome. There is also a Sandbox so you can try it out yourself.