Jonas Bonér on Reactive Systems Anti-Patterns
Taking the opportunity offered by the update to the Reactive Manifesto, InfoQ asked Jonas Bonér, Typesafe CTO and original author of the first Reactive Manifesto, some questions about his vision of “Reactive” applications. Jonas offered his thoughts about both desirable features of reactive applications and what is not reactive programming.
InfoQ: Could you please explain your view of service size and behavior in the Reactive model?
Jonas: Service size and management is really a trend that we see changing in a Reactive direction. Docker, for example, is a container that gives much better control of granularity, and a great example of the trend towards smaller units of managed services. When I talk about service size I don’t refer to lines of code or bytes but about the single responsibility principle—doing one thing and one thing well. By having a simple contract that a service lives up to, it can more reliably handle that one thing. Relying on simple protocols makes the services easier to compose and to keep isolated from each other—which makes them easier to both write, understand, maintain, deploy and upgrade. So to sum things up; small services with high integrity and cohesion that you can compose, but still be managed in isolation—that’s the vision of the whole microservices movement, and that is in the spirit of Reactive.
InfoQ: How is this different than how services are usually composed today?
Jonas: In classic JEE apps, services are written in a very monolithic way. That ties back to a strong coupling between the components in the service and between services. It makes it hard to understand the system, with the services tangled and dependent. It’s also very hard to evolve the system like that—hard to let these services that are coupled in the monolith to evolve independently. In non-Reactive world—since all these services are tightly coupled—you need to upgrade all of them at once. And in terms of managing failure, when one service fails it can take down the entire app, instead of allowing you to deal with the failure in isolation.
App servers (WebLogic, JBoss, Tomcat, etc.) really encourage this monolithic model. They assume that you are bundling your service JARs into an EAR file as a way of grouping your services, which you then deploy—alongside all your other applications and services—into the single running instance of the app server, who manages the service “isolation” through class loader tricks; a very fragile model.
InfoQ: In the manifesto you talk about how reactive applications should meet failure “with elegance” rather than disaster. What do you mean by that?
Jonas: Yes, error handling is a great example of how failure management is an afterthought for most applications. The two main problems I see are first, poor isolation and containment of errors and second, that all types of errors are sent right back to the client in a synchronous fashion.
I like to give an analogy about a vending machine. A guy wants to buy a coffee, so he walks up to a coffee machine and is supposed to put two quarters in to get a coffee. If he only puts one quarter in, not much will happen, since he has not fulfilled the contract of the service. So the machine—instead of returning coffee—displays a validation error telling the user: “please give me another quarter”. This is what you would expect. The user of the coffee machine is responsible for fulfilling his part of the service contract. Most applications do a good job of presenting validation errors and handling “failure” at that level.
But what happens when he puts in two quarters, but the coffee machine doesn’t work because the beans are jammed in the grinder? You would not expect the machine to return with a message telling the user to open it up and disassemble it in order to fix the problem. This is not the user’s responsibility. Instead (ideally) the machine would send a notification to a vending machine service guy that can come and fix the problem.
It’s an oversimplified analogy, but the point is that this separation of validation errors and application errors is very important, yet something we often see missing and confused in JEE applications. I don’t believe that application errors should be thrown into the user’s face, but still that is what Java (most languages for that matter) expects you to do with its synchronous exceptions (blowing your call-stack) and try-catch statements as the only tool for error handling. It forces you into a model where you need to program very defensively and be prepared for anything to blow up anywhere—since any service or method call can, at any point in time, return with an application error. As a result of this flawed model we often see applications where the error handling is scattered all over the application and tangled with the business logic in an incomprehensible mess.
InfoQ: Then, what does reactive error management look like?
Jonas: A Reactive approach is able to first isolate and contain the error to avoid it from spreading out of control—which can lead to cascading failures, taking down the whole application—and instead capture it at its root allowing fine-grained failure management and self-healing. Second, it allows you to reify the error as a message and send it to the best suitable receiver—the component best suitable for managing the failure (usually called the component’s Supervisor)—not just right back to the user of the service. Now, if the error is just an ordinary message then it can be managed just like any other message; sent asynchronously, to one or many listeners, even across the network for full resilience. This means that failure is no longer something exceptional, but part of the normal message workflow; giving you a natural way to design for failure—a model sometimes called “embrace failure” or “let it crash”.
InfoQ: State is a major challenge for scale. What’s the opportunity for state to be handled in a more reactive way?
Jonas: The biggest impediment for scale is shared mutable state (to be precise: contended access to shared mutable state). As soon as you have shared mutable state, you need to guard that state through a gateway of serial access, which means adding coordination and mutual exclusion, and that adds contention—services waiting in line for access the shared state. Contention is the biggest scalability killer.
How people approach state today—they want to continue to program as if everything were still running on a single CPU, where they have a full control of the ordering of the instructions. That’s a nice model because it’s easy to understand, and it was true 15 years ago—but we shouldn’t lie to ourselves any longer, it’s a vastly different world now, and we need to rethink how we design and think about software. The current reality doesn’t match our beloved von Neumann Architecture anymore, and hanging on to it by trying to emulate it will just make matters worse.
But unfortunately, way too often I have seen that instead of addressing the problem at its root cause and simplify it by applying the right design and principles from the start, people keep adding layers in complexity by bringing in more tools and products in an attempt to keep their mutable state in sync. The problem with this is that it doesn’t scale -- the more nodes (or cores) you add, the more nodes need to be part of that consistent view, and that’s more and more costly and will make the system run slower and slower.
The antidote is share nothing designs. In a share nothing architecture components do not share state, every component (or node) is fully self-contained, lives in isolation, with its own life-cycle and communicates by sending immutable messages. If you rely on share nothing designs then you will both minimize contention and maximize locality reference. This means that things used often together are sitting together, and not just conceptually but in code, which simplifies caching—both less CPU cache line invalidations, better prefetching, as well as more efficient application level caching. It also means minimizing the waiting time in the system by decreasing contention, which makes things more efficient in terms of resource utilization. Now, adding more CPUs and/or nodes just helps, since you have partitioned the system and removed most bottlenecks.
InfoQ: What about applications that really need to share data among its components?
Jonas: It is true that most applications have some need for strong consistency (linearizability), but its the wrong default. It is not uncommon for the data set in your application that has such strong consistency guarantees is fairly small. Why then pay such a high price for all of your data? It is sad to see that most people still reach for their RDBMS whenever they want to persist or coordinate data—just out of habit and/or resistance of change—without thinking through the requirements of that specific data set in terms of consistency and integrity. My advice would be to start off by trying to make your problem fit an eventually consistent, share-nothing design, with components communicating through asynchronous message-passing and in the few places where you need strong consistency then bite the bullet and pay the price. Then you will end up with an architecture that with very few bottlenecks, great scalability and elasticity characteristics and no single point of failure.
InfoQ: In which way do you hope the Reactive Manifesto 2.0 will help developers to get reactive principles right?
Jonas: ?I hope that it will make it clear how the four traits of Reactive; Responsive, Resilient, Elastic and Message-Driven are essential building blocks—how they support and complement each other, and how none of them work well in isolation. I hope that it will help the reader to go beyond the buzzwords and hype and really understand that by going back to the basics of computer science, distilling the essence and rely on a few—but right set of—solid and proven principles, they can simplify their system's design immensely and be ready to tackle the challenges we, as an industry, are faced with today.
InfoQ: How would you describe the reactions to the Reactive Manifesto since its initial definition?
Jonas: I have mainly seen positive ?reactions to the Reactive Manifesto and it has to this date been signed by more than 7600 people. It has sparked great discussions around Reactive, as well as the manifesto itself, which has led to constructive criticism of the first versions of the manifesto.
We are grateful for this feedback and it is what has triggered the complete rewrite of the Reactive Manifesto that we see today. The new version is much shorter and more concise, it tells a more coherent and consistent story while being less prescriptive. The goal has always been to create a living document that evolves and improves over time (which is why it resides on GitHub, comment and pull-request friendly) and I hope that it will continue to do so. If there's been a criticism, it's that its self-evident, which might be true for some people. But what is self-evident to the enlightened can sound like magic to the average Joe. The goal is not to preach to the choir but reach out to the masses and show them a better way—as we conclude the manifesto: "It is time to apply these design principles consciously from the start instead of rediscovering them each time."
InfoQ: One year later, what value do you still see in the Reactive Manifesto?
Jonas: ?I think it has helped in a number of ways. Raising the awareness of the challenges we see have today, and that are waiting in even bigger hordes around the corner. Moving the focus and discussion from tools and products to principles and techniques. Establishing a shared vocabulary how we as an industry talk about these things, which have helped bridge communities, across languages, platforms and industries. ?
The Reactive Manifesto aims at condensing the knowledge about how to design highly scalable and reliable applications into a set of four required architecture traits: responsiveness, resiliency, elasticity and message-driven interactions.
About the Interviewee
Jonas Bonér is a co-founder and CTO of Typesafe, and creator of the Akka message-driven middleware project. He worked on core JVM-level clustering technology at Terracotta, and on the JRockit JVM at BEA. Jonas has also been a contributor to open source projects including the AspectWerkz AOP framework and the Eclipse AspectJ project. He is one of the original proponents of the Reactive Manifesto.