InfoQ Homepage Articles Building Reactive Systems Using Akka’s Actor Model and Domain-Driven Design

Java

Building Reactive Systems Using Akka’s Actor Model and Domain-Driven Design

Oct 25, 2017 23 min read

Follow us on

Youtube232K Followers

Linkedin26K Followers

Key Takeaways

Actor-oriented programming is an alternative to object-oriented programming
Development of highly concurrent systems is easy using actors
Actor systems are not constrained to a single process on a single node but run as a distributed cluster
Actors and the actor model are what “being reactive” is all about
Actors and Domain Driven Design are a perfect match

With the explosion of mobile and data-driven applications, users are demanding real-time access to everything everywhere. System resilience and responsiveness are no longer "nice to have"; they're essential business requirements. Businesses increasingly need to trade up from static, fragile architectures in favor of flexible, elastic systems.

Thus the burgeoning popularity of reactive development. To support reactive development, actor models, together with domain-driven design can fulfill modern resiliency requirements.

The Actor Model, Some History

The actor model was first conceived in the early 70's with the advent of Smalltalk, not long after Object-Oriented Programming (OOP) itself appeared on the scene.

Around 2003, the nature of computing underwent a fundamental shift, as processor speeds started to top out. Over the next decade and a half, clock speed advancements would be incremental, not exponential, as they had been in the past.

But user demand continued to grow, and the computing world had to find some way to respond, hatching the multicore processor. Processing gains became a "team effort", squeezing out efficiencies via communication between multiple cores, rather than through traditional clock speed acceleration.

This is where threading comes in, a concept that is far more complex than it looks. Consider an example of a simple counter representing a scarce resource, such as the number of items in inventory, or tickets available for sale for an event. In this example, there could be many simultaneous requests to obtain one or more of the items or tickets.

Consider this commonly used implementation where one thread handles each purchase request. With this approach, there is the possibility that many concurrently running threads will each attempt to adjust the counter. In order to meet the semantic requirements, our model must ensure that only one thread at a time decrements the counter. The reason for this is that the decrement operation involves two steps:

Check that the counter is currently greater to or equal to the desired counter decrement
Decrement the counter

Here is an example of why the two steps must be completed as a single operation. Each request represents a purchase of one or more items for sale or requests to buy one or more tickets. Say two threads are concurrently trying to adjust the counter, which currently has a value of 5. Thread one wants to decrement the counter by 4. Thread two wants to decrement the counter by 3. They both check the current value of the counter and verify that it is greater than the decrement amount. Then they both go ahead and decrement the counter. The result is 5 - 4 - 3 = -2. The result is the item is over allocated, which in this case violates the specified business rules.

A naive implementation for preventing this type of over allocation would be to perform the checking and decrementing steps in a single atomic operation. Locking the two steps together into a single operation eliminates the possibility of purchasing something when it is sold out, such as two threads attempt to purchase the last item at the same time. Without locking, there is the possibility that multiple threads are simultaneously first checking that the counter is greater than or equal to the desired purchase amount and then they all incorrectly decrement the count, which results in a negative value.

The potential problem with this one-thread-at-a-time approach is that during periods of high contention, there is the possibility for a lengthy queue of threads each waiting for their turn to decrement the counter. A real life example of this is a line of people waiting to purchase tickets to an event.

One of the big downsides of this approach is the potential for many blocked threads, each waiting in single file for their turn to perform a serialized operation.

If application designers aren't careful, the inherent complexity bears the real risk of turning a multicore processor, multithreading application into essentially a single-threaded application, or at least one with a high level of contention among working threads.

A Better Solution for Multithreaded Environments

The actor model elegantly solves this dilemma, providing a foundation for truly multithreaded applications. The actor model is designed to be message-driven and non-blocking, with throughput as part of the natural equation. It gives developers an easy way to program against multiple cores without the cognitive overload typical in concurrency. Let’s see how that works.

Actors consist of senders and receivers; simple message-driven objects designed for asynchronicity.

Let's revise the ticket counter scenario described above, replacing a thread based implementation with actors. An actor must of course run on a thread. However, actors only use threads when they have something to do. In our counter scenario, the requestors are represented as customer actors. The ticket count is now maintained with an actor, and it holds the current state of the counter. Both the customer and tickets actors do not hold threads when they are idle or have nothing to do, that is, have no messages to process.

To initiate a buy operation, a customer actor sends a buy message to the single tickets actor. Such buy messages contains the quantity to be purchased. When a tickets actor receives a buy message, it verifies that the purchase amount does not exceed the current remaining count. If the buy request is valid, the count is decremented, and the tickets actor sends a message to the customer actor indicating that the order has been accepted. If the buy amount exceeds the count, the counter actor sends the customer actor a message indicating that the order was rejected. The actor model itself ensures that the processing is handled synchronously.

In the following diagram, we show a few customer actors each sending a buy message to the tickets actor. These buy messages are initially queued in the tickets actor's mailbox.

Figure 3 - Customer Actors Sending Buy Messages

The ticket actor processes each message. Here the first message is a request to buy five tickets.

Figure 4 - Tickets Actor Processing Messages

The ticket actor checks that the buy amount does not exceed the remaining ticket count. In this case, the ticket count is currently 15 so the buy request is approved, the count is decremented, and a message is sent from the tickets actor to the requesting customer actor indicating that the tickets have been purchased.

Figure 5 - Ticket Actor Processing Message Queue

The tickets actor processes each of the messages in its mailbox. Note that there is no need for complicated threading or locking here. This is a multi threaded process, but the actor system manages the use and allocation of threads.

In the following diagram, we see how the tickets actor handles requests that exceed the remaining ticket count. Shown here is a request to buy two tickets when only one ticket is available. The ticket actor rejects this purchase request and sends the requesting customer actor a sold out message.

Figure 6 - Ticket Actor Rejecting Buy Request

Of course, experienced thread level developers know the two-step action of checking, and decrementing the ticket counter is easily implemented as a synchronized sequence of operations; in Java for example, using synchronized methods or synchronized statements. However, the actor based implementation not only provides natural synchronization of operations within each actor, but it also eliminates the potentially large backlog of threads waiting for their turn at the synchronized section. In the ticket example, each customer actor waits for a response without holding a thread. The result is that the actor based solution is easier to implement, and it results in potentially significant reductions in system resources utilization.

Actors: The Rightful Heirs to the Object Throne

The idea of actors being the natural successors to objects is nothing new, and is in fact not all that revolutionary. Smalltalk inventor Alan Kay defined many of the object paradigms still in use. He emphasized the importance of messaging, and went so far as to say that the internal implementation of the objects was secondary.

Even though Smalltalk wasn't originally asynchronous, it was still message-based, where one object would in essence send a message to another object to get anything done. The modern actor model is thus adhering to Alan Kay's earliest ideas of objects oriented design.

Following is a sample Java implementation of an actor in Akka’s actor system (we assign a unique “magic” serial number to each actor to demonstrate state, as we will see).

public class DemoActor extends AbstractActor {

  private final int magicNumber;

  public DemoActor(int magicNumber) {
    this.magicNumber = magicNumber;
  }

  @Override
  public Receive createReceive() {
    return receiveBuilder()
      .match(Integer.class, i -> {
        getSender().tell(i + magicNumber, getSelf());
      })
      .build();
  }

  public static Props props(int magicNumber) {

    // Akka Props is used for creating Actor instances
    return Props.create(DemoActor.class, () -> new DemoActor(magicNumber));
  }
}

Note that the actor is implemented as a class that extends an Akka abstract base class. The implementation of the actor must override one method, the createReceive method, which is responsible for creating a message-receive builder, defining how incoming message objects sent to this actor implementation are handled.

Note also that this actor is stateful. Actor state can be simple, as the magic number in this example, or it can be much more sophisticated.

To create an instance of an actor, we need an ActorSystem. Once an ActorSystem has been started, creating actors typically requires a single line of code.

ActorSystem system = ActorSystem.create("DemoSystem");
ActorRef demo = system.actorOf(DemoActor.props(42), "demo");

The return value from actor creation operation is an actor reference. This actor reference is used to send messages to the actor.

demo.tell(123, ActorRef.noSender());

The above shows some of the basic steps for defining, creating running instances, and sending messages to actors. Of course, there is more to it than this simple example but for the most part developing systems using actors requires learning how to implement applications and services designed as a system of actors interacting with each other by exchanging asynchronous messages.

Better Applications for Better Networks

Looking beyond cores and threads, today's environment also allows developers to take advantage of very high-speed storage devices, lots of memory, and a multitude of highly-scalable, widely-connected devices. This technology all communicates through fairly affordable cloud hosting solutions and fast networks.

But as systems become more distributed, increased latency is a given. Distributed systems can be interrupted by downtime or a partition on the network, caused perhaps by one or more servers going out of commission, producing, latency. The object model is ill-suited to deal with this issue. Because every request and every response is asynchronous, the actor model helps developers address this problem.

With the actor model, reduced latency comes for free. Because of this, immediate results aren't expected and the system only reacts to messages when they are sent or received. When latency degradation is detected, the system automatically reacts and adjusts rather than shutting down.

A distributed clusters of nodes each running subsets of actors is the natural environment for actors interacting with each other via asynchronous messages. Adding to the basic capabilities of actors is the fact that the message senders and the receiving actors are not constrained to a single JVM process boundary. One of the best features of Akka is that you can build systems that can run in a cluster. An Akka cluster is a set of nodes running in independent JVMs. Programmatically it is just as easy to send a message to an actor in a local JVM as it is to send a message to an actor running in another JVM. As shown in the following diagram, actors distributed on multiple cluster nodes can send message to other actors on other cluster nodes.

Running in a clustered environment adds an entirely new dynamic to the architecture of an actor system. It is one thing to be running on a single server, in a single process, and within a single JVM. It is entirely another thing to be running a system that spans a cluster of JVMs spread across a network.

In a single JVM, with actors running in an actor system, the JVM is either running or not running. On the other hand, when running in a cluster, at any point in time the topology of the cluster may change. Cluster node JVMs may come and go at a moment’s notice.

The cluster itself is technically up as long as at least one node is up. Actors on one node may be happily exchanging messages with actors on other nodes then, without warning, a node goes away taking down the actors that were resident on that node. How are the remaining actors supposed to react to these changes?

The loss of a cluster node impacts the exchange of messages both to message senders and message receivers.

For message receivers, there is always the possibility that an expected message will never be received. The receiving actor needs to take this into consideration. There needs to be a plan B. The fact that expected messages may not be received is a fact of life with asynchronous messaging. It is also true that in most cases dealing with lost incoming messages does not require cluster awareness.

On the other hand, it is often necessary for message senders to have some level of cluster awareness. Router actors can handle the logistics of sending messages to other actors that may be distributed across the cluster. A router actor receives messages, but it does not handle the message itself. It forwards the message to a worker actor. These worker actors are often referred to as routees. The router actor is responsible for routing messages to other routee actors based on a routing algorithm. The actual routing algorithm used varies based on the specific requirements of each router. Examples of routing algorithms are round robin, random, smallest mailbox, etc.

Consider the example scenario shown in the following diagram (recall that the client that sends a message to an actor has no idea how that actor will handle the message.) The receiving actor is a black box from the perspective of the client. The recipient actor may delegate the work to be done to other worker actors. The recipient actor, in this case, could be a router. It routes incoming messages to delegate routee actors that do the work.

In this example scenario, the router actor could be cluster aware, and it could be routing messages to actors that are distributed across nodes in the cluster. So what does it mean to be cluster aware?

Cluster aware actors use information about the composition of the current cluster state to make decisions about how to route incoming messages to other actors that are distributed across the cluster. One of the most common use cases for cluster aware actors is routers. Cluster aware routers decide how to route messages to routee actors based on the current state of the cluster. For example, a router that knows the location of routee actors that are distributed across the cluster routes messages to routees based on a distributed work algorithm.

How the Actor Model Supports Reactive Systems

As defined in the Reactive Manifesto, "Reactive is responsive, Reactive is resilient, Reactive is elastic, and Reactive is message-driven." The message-driven component is essentially what allows the other three characteristics of Reactive to be supported.

Boosting System Responsiveness

Reactive is responsive in that systems can dynamically adapt to changing user demands. It's not uncommon for a reactive system to be able to respond to user requirements in a request/response fashion. In supporting reactive through the actor model, developers are able to achieve very high throughputs.

Bolstering System Resiliency

Resiliency is also supported through message-passing and other functionality offered by message-driven architecture. When a client actor sends a message to a server actor receiver, the client does not have to deal with exception handling that may be caused within that server object or actor.

Consider a typical object-oriented architecture, where a client sends a message or invokes a method on a receiver, forcing the client to deal with any sort of crash or exception being thrown. In response, a client would typically rethrow or toss the exception up to some higher-level component and hope that someone deals with it. But clients are ill-suited to fixing server crashes.

In the actor model, especially with Akka, there is a hierarchy that's established for supervision. When a server crashes or throws an exception on an incoming message, it's not the client that has to deal with the crash but the parent of the server actor or object that takes care of it.

The parent is in a much better position to understand the possibilities of a crash on its child actor, and so can react to it and restart that actor. Thus the client only has to deal with the knowledge that it has either received a response to its request or it has not. And based on a timer or scheduled event, it can ask the same actor for the same request to be handled again if it doesn't receive a response within an acceptable timeframe. So actor systems are (when built correctly!) very resilient.

Here is an example that shows actor supervision in action. In Figure 7, actor R is a supervisor that has created four worker actors. Actors A and B are actors that send messages to actor R requesting that actor R perform some action. Actor R delegates the work out to one of its available worker actors.

Figure 7 - Actor A Message Delegated by R to Worker Actor

In this example, the worker actor runs into a problem, as shown in Figure 8, and throws an exception. The exception is handled by the supervisor; actor R. Supervisor actors follow a well-defined supervision strategy for handling worker errors. The supervisor may elect to simply resume the actor in the case of simple problems, or it may restart the worker, or stop it, depending on the severity and recovery strategy.

Figure 8 - Worker Actor Throws an Exception

While the exception is handled by the supervisor, actor A is expecting a response message. Note that actor A is merely expecting a message and not awaiting a message.

This exchange of asynchronous messages introduces some interesting dynamics. Actor A hopes that actor R reacts as expected to its messages. However, there is no guarantee that actor A's messages will be processed or that a response message will be returned. Any number of problems may occur that will break this asynchronous request and response cycle. For example, consider the case where actor A and actor R are running on different nodes, and messages between actor A and actor R are sent across a network connection. The network may be down, or the node where actor R is running may fail. Or possibly the task to be performed may fail, for example if a database operation fails due to a network failure or down server.

Given that there are no guarantees, a common approach to handle this is for actor A to be implemented to expect two possible outcomes. One outcome is that when it sends a message to actor R, it eventually receives a response message. The other possible outcome is that actor A may also expect to receive an alternate message that indicates that the expected response has not been received. This approach involves actor A sending two messages: one is a message to actor R and the other is a message to be sent to itself at some specified time in the future.

Figure 9 - Actor A Receives a Timeout Message

The basic strategy used here is that there is a plan A and a plan B. Plan A is that everything works as expected. Actor A sends messages to actor R, the expected task is performed, and a response message is returned to actor A. Plan B handles the situation where actor R is unable to process the request message.

Expanding System Elasticity

Reactive systems are also elastic. They can grow and shrink according to current demand. Truly reactive designs have no contention points or central bottlenecks, so you can shard or replicate components and distribute inputs among them. They enable predictive and reactive scaling algorithms by providing relevant live performance measures.

Actor models support this by dynamically responding to the peaks and valleys of user activity, intelligently ramping up performance when needed and reserving power during periods of low usage. Their message-driven nature inherently leads to a greater degree of elasticity.

Elasticity requires two key ingredients. One is a mechanism for expanding and contracting the processing capacity of the system as the load goes up and down. The second is a mechanism that allows for the system to react appropriately as the capacity of the system changes.

There are many ways to handle expanding and contracting the processing capacity. In general, capacity changes are handled manually or automatically. An example of a manual process is to increase the processing capacity in preparation for a seasonal peak in customer traffic. The classic examples are Black Friday and Cyber Monday or Singles Day. Automatic scaling is a useful feature provided by many cloud providers, such as Amazon AWS.

The actor model and the actor model implementation Akka do not provide any mechanisms for triggering processing capacity adjustments, but it is an ideal platform for building systems that react appropriately when the cluster topology changes. As discussed previously, at the actor level it is possible to implement cluster aware actors that are specifically designed to react when nodes leave or join the cluster. As a happy circumstance in many cases when you design and implement actors systems to be more resilient you are also laying the foundation for elasticity. When your system can handle distributed nodes leaving the cluster due to failures and new node joining the cluster to replace failed nodes, there is no difference if the nodes are leaving and joining due to failures or due to making adjustments to the available processing capacity as a reaction to changes in the level of activity.

The Message Matters

As we said, the actor model is focused on direct asynchronous messaging. To support this it is necessary for a sender to know the receiver's address, in order to send a message to the receiver.

Actors are lock-free and they share nothing. If three senders were to each send a message simultaneously to a receiver, the receiver would enqueue those messages in its mailbox and process them one at a time. Therefore the receiver would not need to lock internally to protect its state from multiple threads operating on it at once. And the receiver would not share its internal state with any other actor.

The actor model also presents the opportunity to prepare receiving actors to handle their next message. For example, suppose there are two program flow sequences. When a sender in the first sequence sends a message to a receiver, that receiver reacts to that message, and then transforms to another kind of message listener. Now when a message in the second sequence is sent to the same actor, it responds using a different set of logic in its receive block (the actor swaps the message receiving logic when it decides to change state. There is an example of this in the Akka documentation).

Doing More with Less

Another important issue the actor model helps solve is doing more with less. Systems of all sizes can benefit—from massive networks like those used by Amazon and Netflix, down to much smaller architectures. The actor model allows developers to squeeze the most out of every server, presenting a high potential to scale back clusters.

How many actors can an actor-based service have? Fifty? One hundred? Perhaps! Actor-based systems are so flexible and elastic, they can potentially support an enormous number of actors, into the many millions.

In a typical N-Tier architecture, or what might be called a "ports-and-adapters" or hexagonal architecture, there is a lot of unnecessary complexity, maybe even accidental complexity. One of the biggest advantages of the actor model is that it can collapse a lot of this complexity and drop down to say one set of "controller" adaptors on the boundary. The controllers can delegate by sending messages to a domain model, and the domain model emits events. In this way, the actor model greatly reduces network complexity, allowing designers to achieve more, with limited resources and budget.

Accelerating Business with Domain-Driven Design

The essence of domain-driven design (DDD) consists of modeling a ubiquitous language in a bounded context. Let me explain: think about modeling a service of some kind as a bounded context. That bounded context is a semantic boundary where everything inside it, including the domain model, has specific definitions, comprising a language that's spoken by teammates to help developers understand the meaning of each of the concepts in the bounded context.

This is where context-mapping comes into play; context-mapping models how each of the bounded contexts correspond to one another, what the team relationships are, and how the models interact and integrate. Leverage in context-mapping is also very important because bounded contexts tend to be much smaller than what many are accustomed to within a monolithic mindset.

Responding to rapid new business direction is a big challenge for just about any enterprise and their development teams. DDD enables the necessary knowledge crunching in dealing with these evolving business directions. And actors and messages potentially help developers rapidly implement the domain models in response to those demands, and to have clear understanding about the domain models.

Actors and DDD: A Perfect Match

To quote Alan Kay, "The Actor model retained more of what I thought were the good features of the object idea." Also from Kay: "The big idea is messaging." In creating a ubiquitous language, developers can focus on the actors as being the objects or the components, the elements of the domain model, and the messages sent between them.

One reactive service is no service; reactive services come in systems. And so what developers are ultimately trying to accomplish is to build full systems, not just single services. By emitting domain events to other bounded context or microservices, developers can accomplish this more easily.

Why Actors are Better for Business

Actors are ideal for use with DDD, as they speak the ubiquitous language of the core business domain. They're designed to gracefully handle business failure, maintaining system resilience and responsiveness no matter what's occurring on the network. They help developers reactively scale systems to meet concurrency demands, elastically growing up and out to handle peak loads and shrinking when traffic is lighter, thus minimizing infrastructure footprint and hardware needs. It's a model that's more appropriate for today's highly-distributed, multithreaded environments, and one that can generate business benefits that go far beyond the server room.

About the Authors

Markus Eisele is a Java Champion, former Java EE Expert Group member, founder of JavaLand, reputed speaker at Java conferences around the world, and a very well known figure in the Enterprise Java world. He works as a developer advocate at Lightbend. Find him on Twitter @myfear.

Hugh McKee is a developer advocate at Lightbend. He has had a long career building applications that evolved slowly, that inefficiently utilized their infrastructure, and that was brittle and prone to failure. That all changed when we started building reactive, asynchronous, actor-based systems. This radically new way of building applications rocked his world. As an added benefit, building application systems became way more fun than it had ever been. Now he is focused on helping others to discover the significant advantages and joys of building responsive, resilient, elastic, message-driven applications.

InfoQ Software Architects' Newsletter