Virtual Panel on Reactive Programming
Reactive programming is a very hot topic: libraries for building reactive systems are mushrooming on many platforms and languages. Initiatives like the Reactive Manifesto (watch the InfoQ interview with Francesco Cesarini and Viktor Klang) are promoting the idea, and with Reactive Streams (also see Reactive Streams with Akka Streams on InfoQ) there are even efforts to provide interoperability between reactive libraries.
But what does it mean to be reactive? How do implementations differ from each other? InfoQ brought together three proponents of reactive programming in a virtual panel to find out and learn more about the individual projects.
Note: we probably missed your favorite library to handle streams of data and other reactive use cases. Tell us in the comments which other libraries you'd like to see covered (and a good panel participant to contact) and we'll try to put together a sequel to this virtual panel.
Viktor Klang is a Chief Architect at Typesafe and former Akka project leader.
Timothy Baldridge is a Clojure Developer at Cognitect and Core.Async committer.
Jafar Husain is a Technical Lead at Netflix and contributes to RxJava.
We asked them to answer the following questions:
- Please give us a short introduction to your library or framework.
- How do you achieve reactiveness/concurrency? On what primitives, concepts, or language constructs does it build? Can the programmer control how the code is executed or deal with back pressure?
- How does it integrate with the facilities of the platform, i.e. I/O libraries, Collections, available algorithms, etc.?
- Compared to other approaches in your language, why is your solution better? Does it mainly prevent mistakes, lower the hurdle, or enable new styles of programming?
- For what kind of problems is your library the best solution, or what was the original motivation to implement it? And when would you use something different?
- Do I need to rethink the way I program? That is, are there any restrictions, can I only use pure functions, do I wrap everything in a monad or have to pass continuations?
- Does your programming language or platform bring any benefits, or are there things it complicates or makes outright impossible? Were there particular implementation challenges that you'd like to share?
InfoQ: Please give us a short introduction to your library or framework.
Viktor Klang: Akka is a library and runtime for Reactive applications on the JVM (primarily Java and Scala, but bindings to JRuby, Clojure and more exists) which employs Actors (see Actor Model) as its primary construct for concurrency; distribution; resilience; and scales from tiny devices to massive servers and from a single node up to thousands in a cluster.
Jafar Husain: Rx poses the question: What's the difference between an Event and a collection like an Array or a Set? Today most developers program against events and collections very differently. Rx presents you with a unified programming model for both events and collections. The library enables you to work with events the same way that you work with collections, transforming them using functions like map, filter, and reduce. Rather than building state machines to respond to sequences of events, you use these methods to create complex events from simple ones.
InfoQ: How do you achieve reactiveness/concurrency? On what primitives, concepts, or language constructs does it build? Can the programmer control how the code is executed or deal with back pressure?
Timothy Baldridge: The code is organized into logical threads. These threads may or may not map 1:1 with OS threads, that is up to the programmer. Communication between logical threads is via blocking queues known as channels. These channels provide back pressure and support multiple readers and writers on the same channel. Due to the simplicity of this model it is quite easy to scale to greater levels of parallelism as needed, one simply allocates more logical threads to read from or write to the same channel.
Jafar Husain: RX introduces a new collection type: the Observable. An Observable is similar to an event on a UI element, but an Observable is a first-class object which can be passed around just like a List or a Set.
Like an Event, you can subscribe a callback to an Observable and be notified whenever data arrives asynchronously. However an Observable adds the notion of completion to event streams. This simple addition enables developers to build complex, event-driven systems without ever having to explicitly unsubscribe.
Normally when building long-lived applications like user interfaces you must be careful to unsubscribe from events in order to avoid memory leaks. Events will hold onto your handlers, even in situations in which they will never fire them again (ex. "document.onload" which fires when a web page DOM has loaded). By contrast when Observable streams of data complete, your handler is automatically unsubscribed.
You can control when an operation occurs by using a Scheduler. A Scheduler controls when and where an Observable notifies you that data has arrived. Schedulers can ensure that subscriptions or notifications happen on a different thread, or on the same thread after certain period of time.
For example, you can parallelize the execution of two Observables by scheduling each of them to run on the thread pool. In single-threaded environments like JS, schedulers can be used to observe an event stream at key times in the event loop,
With regard to back pressure, it is always possible to unsubscribe from an Observable to stem the flow of data and then resubscribe later. More sophisticated approaches are currently being explored, and you can expect to see them in future versions of Rx.
Viktor Klang: As mentioned previously, the main abstraction is Akka Actors ("Actors"), a JVM implementation of the Actor Model. An Actor is logically isolated from other Actors, which means that Actors run concurrently with each other, inside what is called an ActorSystem—a logical hierarchy of Actors.
At the core of an Actor is a behavior which it applies to its incoming messages, and you communicate with the Actor by sending it messages which it then processes one at a time. While processing a message it can decide to: create new Actors; send messages to Actors it knows about; change behavior for the next message it processes; or any combination of these actions. Both the creation of Actors and sending messages to them is performed asynchronously. This means that Actors are inherently Event-Driven.
An Actor has:
- a "parent" Actor which created it
- a behavior which it will apply to the next message it processes
- a Mailbox where it stores inbound messages while it is busy processing
- a Dispatcher which coordinates the execution of the Actor
- 0..N "child" Actors which it has created
An Actor is a very light-weight construct—typically weighing in at around 450 bytes—which means that you can run millions of them concurrently on commodity hardware.
Execution of Actors (the application of messages to behaviors) is done by Akka Dispatchers, which are typically backed by an ExecutorService. All configurable by the programmer—both programmatically and externally through configuration. In fact, most things are configurable and customizable by the programmer: Mailbox implementation, ExecutorServices, Dispatchers etc.
Backpressure is implemented by Acking/Naking inbound messages—this means non-blocking, asynchronous backpressure.
Resilience is achieved through something called Supervision, where thrown exceptions inside an Actor are escalated to that Actor's parent while execution is suspended for the failing Actor until the parent decides how to handle the failure. The following outcomes are possible:
- Resume (ignore the failure and resume processing message for the failing actor)
- Restart (throw away the old instance of the failing actor and create a new one, keeping the Mailbox intact)
- Stop (terminates the failing actor)
- Escalate (re-throws the failure, escalating the problem to the "grandparent")
This Supervisor Hierarchy means that Actors can create new Actors to perform potentially risky operations, and then use Supervision to deal with any failures, without putting themselves at risk. This is commonly referred to as "The Error Kernel Pattern".
What's interesting is that this also works when the hierarchy is spread out over multiple nodes, due to the message-driven nature combined with Failure Detectors.
A common pattern is also to have groups of Actors use different Dispatchers & ExecutorServices to isolate their execution—so if one group runs amok, the others are unaffected. This is commonly referred to as “Bulkheading”.
InfoQ: How does it integrate with the facilities of the platform, i.e. I/O libraries, Collections, available algorithms, etc.?
Rx provides helper methods to transform any asynchronous interface into an Observable. This makes it easy to gradually integrate Rx into any existing program. You can start small, using composition on a subset of the events in your system, and then gradually expanding it.
Viktor Klang: Akka has an IO library which builds on top of Java's NIO capabilities that exposes IO as simple and familiar message passing: sending and receiving chunks of data.
For Collections you are free to use whatever you want inside your Actors, but when sending messages with collections in them, it is strongly advised to send immutable collections.
That aside, you can use any JVM library of your choice.
InfoQ: Compared to other approaches in your language, why is your solution better? Does it mainly prevent mistakes, lower the hurdle, or enable new styles of programming?
Viktor Klang: Actors are a general-purpose construct that unifies concurrency, distribution and failure management, which enables scaling both up & down and in & out. For more specialized use cases other tools like Futures, Agents, etc may come in handy.
Timothy Baldridge: Often, when an application reaches a certain size, programmers will create distributed queues to break the program up into smaller more manageable pieces. This model also opens more options for scalability as programmers can then tweak the number of readers and writers to a queue as needed to improve performance.
The question them must be asked, why don’t we build our systems this way from the start? Why not start with a highly decoupled system by designing our applications as sets of logical threads communicating over in-process queues. Then when the time comes to scale our app we can start replacing these in-process queues with distributed queues.
Systems built in this way are often easier to debug as each component can be tested on its own before attaching it via channels to other components.
Jafar Husain: Rx is easy to learn because it leverages a developers existing knowledge of how to compose collections using transformation functions like map, filter, and reduce. In fact the hardest part is unlearning the idea that events are somehow different than the collections you already work with every day.
RX also helps developers avoid common pitfalls like memory leaks. When building conventional event-based systems developers commonly rely on state machines to determine when to unsubscribe from events. Rx allows you to build event streams that declaratively specify the conditions under which they end. Once an event stream ends, it cleans up any outstanding subscriptions for you.
InfoQ: For what kind of problems is your library the best solution, or what was the original motivation to implement it? And when would you use something different?
Timothy Baldridge: Core.Async succeeds very well in systems where a program needs to interface with other systems in an asynchronous fashion. These systems could be external queueing services, databases or even the HTML DOM. However, this model does introduce a small amount of overhead. Therefore it would probably be unwise to use this library when the only goal is parallelism. For example: I wouldn’t use core.async to build a 3D raytracer. If you have a problem that is synchronous and embarrassingly parallel, there are better options available.
Jafar Husain: The idea for Rx came about when Erik Meijer and Brian Beckman noticed the fundamental correspondence between the Iterator pattern and the Observer pattern. Each pattern enables a data producer to send data to a data consumer progressively, one item at a time. The difference is that in the Iterator pattern the consumer is in control, pulling data from the producer, whereas in the Observer pattern the producer is in control, pushing data to the consumer.
Erik and Brian noticed a curious omission in the Observer pattern. In the Iterator pattern, there is a well-defined way for the producer to signal to the consumer that the sequence of data has ended, but in the Observer pattern there is no such semantic. For example, there is no well-defined way for a DOM event to signal to a consumer that no more data will arrive.
Erik and Brian realized that these two ubiquitous design patterns could be unified if they simply added the notion of completion to the Observer pattern. The result was a new type: an Observable. These two types are duals, which means that any operator that can be used to compose an Iterable can also be defined for an Observable. This means it is possible to query event streams just like databases.
Viktor Klang: Akka was created by Jonas Bonér as a way to bring a lot of the good things from Erlang onto the JVM—specifically having independent “processes” that communicate via messages and dealing with failure through Supervision.
One driving principle behind Akka is that all of the fundamental operations are location transparent, which means that it does not matter where the Actor is located, for the purpose of ending it messages or supervising it. This makes Akka the perfect tool for distributing an application across multiple machines, both for increased processing power and for resilience.
InfoQ: Do I need to rethink the way I program? That is, are there any restrictions, can I only use pure functions, do I wrap everything in a monad or have to pass continuations?
Timothy Baldridge: Thankfully monads or continuation passing are not required. The lightweight threads in Core.Async are supported via a code-rewriting macro. The users of the library simply wrap their code in a “go” block, and the macro does the rest; rewriting the code into callbacks that are attached to channels. This is one of the most powerful features of the library, it allows programmers write code that looks much like the imperative code they write every day.
The one area that does take a little thought is when it comes to IO. The library makes a distinction between logical threads that are backed by OS threads, and those that are not. These two types of threads work better for different tasks. Dedicated threads are often used for IO while lightweight threads are recommended for more CPU intensive tasks.
Viktor Klang: I’d say at first you change the way you program by starting to think about communication—the protocol of messages that will flow between your Actors.
A quick and fun way of doing that is by thinking about how you would solve it with people—because humans coordinate by sending each other messages, whether it be email, spoken language, instant messaging or otherwise. We don’t flip other people’s neurons directly! (compare to shared memory concurrency.)
Another interesting change is how you can now start to decompose your architecture into individual pieces that can run independently of each other.
The second change is to think about failure management, with the delegation of risky operations to shield the important parts of your application from cascading failure.
Since Actors run concurrently with other Actors, it is important to not share mutable state between Actors—Actors should only interact via messages.
But the code that runs inside of the Actor—its behavior—is normal, call-me-some-methods, kind of code.
Jafar Husain: If you already know how to use functional composition to transform collections, you don't need to change the way that you program. It's worth pointing out that Observable is actually a slightly rephrased version of the continuation monad, and side effects are delayed until you subscribe. However in practice, developers don't really need to know about monads to use RX. At Netflix, we train developers to use this technology without ever mentioning the "M" word.
InfoQ: Does your programming language or platform bring any benefits, or are there things it complicates or makes outright impossible? Were there particular implementation challenges that you'd like to share?
Viktor Klang: Akka Actors makes it simple to create applications which can grow and shrink as needed—it works both in the small and in the large, as an example, we recently did an experiment on Google Compute Engine where we ran 2500 Akka nodes in a single cluster.
Scala, the language in which Akka is implemented, has very good support for immutable, efficient data structures and makes it very simple to create immutable message types with “case classes”.
Speaking of challenges—two come to mind: creating lock-free versions of most of the concurrency coordination; and the research and implementation of Akka Cluster, which is a long and interesting story in itself.
Jafar Husain: Obviously it is easier to write in a functional style if your programming language has closures. This was perhaps the biggest obstacle to porting Rx to Java. Ben Christensen, who is responsible for Netflix's popular open source Java port of Rx, had created adapters for a host of languages like Scala, Clojure, and Groovy to make RX easier to use on the JVM. Using Rx will also get much easier in Java with the release of Java 8, which introduces closures to the language for the first time.
Timothy Baldridge: The Clojure language supports a very advanced macro system that allows programmers to extend the language as needed. This has allowed us to write the entire core.async library without once touching the Clojure compiler. The added benefit to this was that porting the library to ClojureScript took a matter of hours, instead of days or weeks.
This in my opinion shows the true power of Clojure and core.async. To be able to write asynchronous code for Clojure on the JVM and then have it “just work” on ClojureScript in a browser is extremely powerful.
About the Panelists
Timothy Baldridge (@timbaldridge) is a developer with Cognitect Inc. He hails from the mountain regions of Denver Colorado (USA). Timothy is a polyglot programmer with experience in Clojure, C#, Python, and Erlang. Most recently he was deeply involved in the development of Clojure's Core.Async library, where he designed, implemented, and maintains the state-machine code rewriting
macro known as "go".
Viktor Klang (@viktorklang)—Chief Architect at Typesafe and former Tech Lead for the Akka project—has a long background on the JVM and currently has a passion for concurrent, asynchronous, distributed and resilient programming.
Check out the Jeuron Application Platform at www.jeuron.org
Srini Penchikala Aug 21, 2014