Developing Transactional Microservices Using Aggregates, Event Sourcing and CQRS - Part 1
The microservice architecture is becoming increasingly popular. It is an approach to modularity that functionally decomposes an application into a set of services. It enables teams developing large, complex applications to deliver better software faster. They can adopt new technology more easily since they can implement each service with the latest and most appropriate technology stack. The microservices architecture also improves an application’s scalability by enabling each service to be deployed on the optimal hardware.
Microservices are not, however, a silver bullet. In particular, domain models, transactions and queries are surprisingly resistant to functional decomposition. As a result, developing transactional business applications using the microservice architecture is challenging. In this article, I describe a way to develop microservices that solves these problems by using Domain Driven Design, Event Sourcing and Command Query Responsibility Segregation (CQRS). Let’s first look at the challenges developers face when writing microservices.
Microservice Development Challenges
Modularity is essential when developing large, complex applications. Most modern applications are too large to be developed by an individual. They are also too complex to be understood by a single person. Applications must be decomposed into modules that are developed and understood by a team of developers. In a monolithic application, modules are defined using programming language constructs such as Java packages. However, this approach tends to not work well in practice. Long lived, monolithic applications usually degenerate into big balls of mud.
The microservice architecture uses services as the unit of modularity. Each service corresponds to a business capability, which is something an organization does in order to create value. A microservices-based online store, for example, consists of various services including Order Service, Customer Service, Catalog Service.
Each service has an impermeable boundary that is difficult to violate. As a result, the modularity of the application is much easier to preserve over time. The microservice architecture has other benefits including the ability to deploy and scale services independently.
Unfortunately, decomposing an application into services is not as easy as it sounds. Several different aspects of applications - domain models, transactions and queries - are difficult to decompose. Let’s look at the reasons why.
Problem #1 - Decomposing a Domain Model
The Domain Model pattern is a good way to implement complex business logic. The domain model for an online store would include classes such as Order, OrderLineItem, Customer and Product. In a microservices architecture, the Order and OrderLineItem classes are part of the Order Service, the Customer class is part of the Customer Service, and the Product class belongs to the Catalog Service.
The challenge with decomposing the domain model, however, is that classes often reference one another. For example, an Order references its Customer and an OrderLineItem references a Product. What do we do about references that want to span service boundaries? Later on you will see how the concept of an Aggregate from Domain-Driven Design (DDD) solves this problem.
Microservices and Databases
A distinctive feature of the microservice architecture is that the data owned by a service is only accessible via that service’s API. In the online store, for example, the OrderService has a database that includes the ORDERS table and the CustomerService has its database, which includes the CUSTOMERS table. Because of this encapsulation, the services are loosely coupled. At development time, a developer can change their service’s schema without having to coordinate with developers working on other service. At runtime, the services are isolated from each other. For example, a service will never be blocked waiting for a database lock owned by another service. Unfortunately, the functional decomposition of the database makes it difficult to maintain data consistency and to implement many kinds of queries.
Problem #2 - Implementing Transactions That Span Services
A traditional monolithic application can rely on ACID transactions to enforce business rules (a.k.a. invariants). Imagine, for example, that customers of the online store have a credit limit that must be checked before creating a new order. The application must ensure that potentially multiple concurrent attempts to place an order do not exceed a customer’s credit limit. If Orders and Customers reside in the same database it is trivial to use an ACID transaction (with the appropriate isolation level) as follows:
BEGIN TRANSACTION … SELECT ORDER_TOTAL FROM ORDERS WHERE CUSTOMER_ID = ? … SELECT CREDIT_LIMIT FROM CUSTOMERS WHERE CUSTOMER_ID = ? … INSERT INTO ORDERS … … COMMIT TRANSACTION
Sadly, we cannot use such a straightforward approach to maintain data consistency in a microservices-based application. The ORDERS and CUSTOMERS tables are owned by different services and can only be accessed via APIs. They might also be in different databases.
The traditional solution is 2PC (a.k.a. distributed transactions) but this is not a viable technology for modern applications. The CAP theorem requires you to chose between availability and consistency, and availability is usually the better choice. Moreover, many modern technologies, such as most NoSQL databases, do not even support ACID transactions let alone, 2PC. Maintaining data consistency is essential so we need another solution. Later on you will see that the solution is to use an event-driven architecture based on a technique known as event sourcing.
Problem #3 - Querying and Reporting
Maintaining data consistency is not the only challenge. Another problem is querying and reporting. In a traditional monolithic application it is extremely common to write queries that use joins. For example, it is easy to find recent customers and their large orders using a query such as:
SELECT * FROM CUSTOMER c, ORDER o WHERE c.id = o.ID AND o.ORDER_TOTAL > 100000 AND o.STATE = 'SHIPPED' AND c.CREATION_DATE > ?
We cannot use this kind of query in a microservices-based online store. As mentioned earlier, the ORDERS and CUSTOMERS tables are owned by different services and can only be accessed via APIs. Some services might not even be using a SQL database. Others, as you will see below, might use an approach known as Event Sourcing, which makes querying even more challenging. Later on, you will learn that the solution is to maintain materialized views using an approach known as Command Query Responsibility Segregation (CQRS). But first, let’s look at how Domain-Driven design (DDD) is an essential tool for the development of domain model-based business logic for microservices.
DDD Aggregates are the Building Blocks of Microservices
As you can see, there are several problems that must be solved in order to successfully develop business applications using the microservice architecture. The solution to some of these problems can be found in the must-read book Domain-Driven Design by Eric Evans. This book, published in 2003, describes an approach to designing complex software that is very useful when developing microservices. In particular, Domain-Driven Design enables you to create a modular domain model that can be partitioned across services.
What is an Aggregate?
In Domain-Driven Design, Evans defines several building blocks for domain models. Many have become part of everyday developer language including entity, which is an object with a persistent identity; value object, which is an object that has no identity and is defined by its attributes; service, which contains business logic that doesn’t belong in an entity or value object service; and repository, which represents a collection of persistent entities. One building block, the aggregate, has mostly been ignored by developers except by those who are DDD purists. It turns out, however, that aggregates are key to developing microservices.
An aggregate is a cluster of domain objects that can be treated as a unit. It consists of a root entity and possibly one or more other associated entities and value objects. For example, the domain model for the online store contains aggregates such as Order and Customer. An Order aggregate consists of an Order entity (the root), one or more OrderLineItem value objects along with other value objects such as a delivery Address and PaymentInformation. A Customer aggregate consists of the Customer root entity along with other value objects such a DeliveryInfo and PaymentInformation.
Using aggregates decomposes a domain model into chunks, which are individually easier to understand. It also clarifies the scope of operations such as load and delete. An aggregate is usually loaded in its entirety from the database. Deleting an aggregate deletes all of the objects. The benefit of aggregates, however, goes far beyond modularizing a domain model. That is because aggregates must obey certain rules.
Inter-Aggregate References Must Use Primary Keys
The first rule is that aggregates reference each other by identity (e.g. primary key) instead of object references. For example, an Order references its Customer using a customerId rather than a reference to the Customer object. Similarly, an OrderLineItem references a Product using a productId.
This approach is quite different than traditional object modeling, which considers foreign keys in the domain model to be a design smell. The use of identity rather than object references means that the aggregates are loosely coupled. You can easily put different aggregates in different services. In fact, a service’s business logic consists of a domain model that is a collection of aggregates. For example, the OrderService contains the Order aggregate and the CustomerService contains the Customer aggregate.
One Transaction Creates or Updates One Aggregate
The second rule that aggregates must obey is that a transaction can only create or update a single aggregate. When I first read about this rule many years ago, it made no sense! At the time, I was developing traditional monolithic, RDBMS-based applications and so transactions could update arbitrary data. Today, however, this constraint is perfect for the microservice architecture. It ensures that a transaction is contained within a service. This constraint also matches the limited transaction model of most NoSQL databases.
When developing a domain model, a key decision you must make is how large to make each aggregate. On the one hand, aggregates should ideally be small. It improves modularity by separating concerns. It is more efficient since aggregates are typically loaded in their entirety. Also, because updates to each aggregate happen sequentially, using fine grained aggregates will increase the number of simultaneous requests that the application can handle and so improve scalability. It will also improve the user experience since it reduces the likelihood of two users attempting to update the same aggregate. On the other hand, because an aggregate is the scope of a transaction, you might need to define a larger aggregate in order to make a particular update atomic.
For example, earlier I described how in the online store’s domain model, Order and Customer are separate aggregates. An alternative design is to make Orders part of the Customer aggregate. A benefit of a larger Customer aggregate is that the application can enforce the credit check atomically. A drawback of this approach is that it combines order and customer management functionality into the same service. It also reduces scalability since transactions that update different orders for the same customer would be serialized. Similarly, two users might conflict if they attempted to edit different orders for the same customer. Also, as the number of orders grows it will become increasingly expensive to load a Customer aggregate. Because of these issues, it is best to make aggregates as fine-grained as possible.
Even though a transaction can only create or update a single aggregate, applications must still maintain consistency between aggregates. The Order Service must, for example, verify that a new Order aggregate will not exceed the Customer aggregate’s credit limit. There are a couple of different ways to maintain consistency. One option is to cheat and create and/or update multiple aggregates in a single transaction. This is only possible if all aggregates are owned by the same service and persisted in same RDBMS. The other, more correct option is to maintain consistency between aggregates using an eventually consistent, event-driven approach.
Using Events to Maintain Data Consistency
In a modern application, there are various constraints on transactions that make it challenging to maintain data consistency across services. Each service has its own private data, yet 2PC is not a viable option. Moreover, many applications use NoSQL databases, which don’t support local ACID transactions, let alone distributed transactions. Consequently, a modern application must use an event-driven, eventually consistent transaction model.
What is an Event?
According to Merriam-Webster an event is something that happens:
In this article, we define a domain event as something that has happened to an aggregate. An event usually represents a state change. Consider, for example, an Order aggregate in the online store. Its state changing events include Order Created, Order Cancelled, Order Shipped. Events can represent attempts to violate a business rule such as a Customer’s credit limit.
Using an Event-Driven Architecture
Services use events to maintain consistency between aggregates as follows: an aggregate publishes an event whenever something notable happens, such as its state changing or there is an attempted violation of a business rule. Other aggregates subscribe to events and respond by updating their own state.
The online store verifies the customer’s credit limit when creating an order using a sequence of steps:
- An Order aggregate, which is created with a NEW status, publishes an OrderCreated event
- The Customer aggregate consumes the OrderCreated event, reserves credit for the order and publishes an CreditReserved event
- The Order aggregate consumes the CreditReserved event, and changes its status to APPROVED
If the credit check fails due to insufficient funds, the Customer aggregate publishes a CreditLimitExceeded event. This event does not correspond to a state change but instead represents a failed attempt to violate a business rule. The Order aggregate consumes this event and changes its state to CANCELLED.
Microservice Architecture as a Web of Event-Driven Aggregates
In this architecture, each service's business logic consists of one or more aggregates. Each transaction performed by a service updates or creates a single aggregate. The services maintain data consistency between aggregates by using events.
A distinctive benefit of this approach is that the aggregates are loosely coupled building blocks. They can be deployed as a monolith or as a set of services. At the start of a project you could use a monolithic architecture. Later, as the size of the application and the development team grows, you can then easily switch to a microservices architecture.
The Microservice architecture functionally decomposes an application into services, each of which corresponds to a business capability. A key challenge when developing microservice-based business applications is that transactions, domain models, and queries resist decomposition. You can decompose a domain model by applying the idea of a Domain Driven Design aggregate. Each service’s business logic is a domain model consisting of one or more DDD aggregates.
Within each service, a transaction creates or updates a single aggregate. Because 2PC is not a viable technology for modern applications, events are used to maintain consistency between aggregates (and services). In part 2, we describe how to implement a reliable event-driven architecture using Event Sourcing. We also show how to implement queries in a microservice architecture using Command Query Responsibility Segregation.
About the Author
Chris Richardson is a developer and architect. He is a Java Champion and the author of POJOs in Action, which describes how to build enterprise Java applications with frameworks such as Spring and Hibernate. Chris was also the founder of the original CloudFoundry.com. He consults with organizations to improve how they develop and deploy applications and is working on his third startup. You can find Chris on Twitter @crichardson and on Eventuate.
A hell of a lot to give up, on highly dubious rationale
However, it glosses over the reason why would want to do this (maintainability), and it's by no means clear that the resultant architecture would solve the problem either.
The article states: "In a monolithic application, modules are defined using programming language constructs such as Java packages. However, this approach tends to not work well in practice. Long lived, monolithic applications usually degenerate into big balls of mud."
I would be surprised if Java packages are used by many monolithic apps as the means of modularizing the codebase; I suspect Maven/gradle modules are more often used. This then allows the build tools themselves to ensure that there are no cyclic dependencies between modules. It does takes some discipline to do this, granted, but there are application frameworks out there that explicitly support/encourage this approach.
In contrast, no such tooling exists for microservices architectures to help prevent it also resulting in a mass of cyclic dependencies. Indeed, since such dependencies are only resolved at runtime rather than build-time, I think it's inevitable that for a long-lived microservices that such cyclic dependencies would occur. And overall, a microservices architecture is likely to require an awful lot more discipline (not to mention overall DevOps maturity) to get right.
Meantime, as the article points out, moving to microservices as a way to tackle modularity (rather than use a modular monoliths approach) means giving up a hell of a lot of useful things: transactional changes across modules; querying/reporting across modules; required use of complex compensating actions (also implying a worse UI/UX); a clumsier programming model (FKs instead of regular object references).
I remain unconvinced.
Re: A hell of a lot to give up, on highly dubious rationale
In contrast, no such tooling exists for microservices architectures to help prevent it also resulting in a mass of cyclic dependencies.
Personally I think it's an antipattern for two bounded contexts to depend on each other (causing the cyclic dependency). But even if that were the case, generally microservices are integrated via a stateless protocol (like HTTP). Meaning that, as long as requests were not made to the services during initialization, they should hypothetically be able to reach each other once fully initialized. This is generally not the case when wiring dependencies within the same process like Spring.
Another fundamentally important piece to microservices is the ability to have multiple versions of the same services alive at the same time. Netflix and Uber are doing this successfully, and our company is also starting to reap the same benefits (though my experience is limited to a couple of months). The key to doing this is having a service discovery tool that allows you to pull a list of available services that meet your version requirements.
We've followed Netflix's lead by using Semver (semver.org/), which allows our services to select the latest backwards compatible release of a service. In this way, even if you have a cyclic dependency between services, if one service (A) initializes faster than the next (B), A will be able to interact with a previous version of B until the latest B becomes available.
Don't get me wrong, this is a heck of a lot more work than using a monolith. However, the advantage of the approach, outside traditional benefit like independent scaling of services, is that we can apply development resources more cleanly (instead having everyone in the same code base).
Re: A hell of a lot to give up, on highly dubious rationale
Personally I think it's an antipattern for two bounded contexts to depend on each other (causing the cyclic dependency).
Agreed. But having good ol' fashioned references between object/components (rather than HTTP requests) means that one can use tried-n-tested patterns such as the dependency inversion principle; or in-memory event bus if you want to get a bit more sophisticated.
Another fundamentally important piece ... the ability to have multiple versions of the same services alive at the same time.
That's a very valid point. OSGi does of course have this problem solved through separate class loader for each module, but Java 9 (Jigsaw) has taken an altogether less sophisticated approach which means that jar hell may yet continue as an issue for monoliths.
[One] advantage of the approach ... is that we can apply development resources more cleanly (instead having everyone in the same code base).
A monolith doesn't mean one codebase. Rather it can and should combine code from multiple (git) repos, each publishing its own (maven) artifact(s).
2-phase commit IS a viable approach to publish and consume events
Driving a complicated system complex
Personally, I think that applying EDA the way you described introduces some risks. There is a difference between data consistency between read- and write models in CQRS and interactions between aggregates in a domain model. Letting aggregates subscribe directly to events emitted by other aggregates simply hides the business intend of the particular interaction, resulting in ever decreasing maintainabillity as your codebase or system grows. Reffering to cynefin, the system moves from complicated to complex.
Re: Driving a complicated system complex
Fowler says you shouldn't
Although, your perspective might differ from theirs...
Re: Driving a complicated system complex
Re: Driving a complicated system complex
I assume that sticking to the rule of thumb to "Only Modify One Aggregate Per Transaction" - without knowing why - won't kill you. Whereas the PITA potential of applying EDA in an "Aggregate To Aggregate" fashion grows along with the system. As you referred to Vernon. Along with others, he suggests that a Microservice should map to a Bounded Context. Allowing aggregates to subscribe directly to events published by aggregates belonging to another BC weakens boundaries. At least, a common dependency (loci of event definitions) is needed, constraining both teams in their ability to move forward independently. As event subscriptions are often part of infrastructure configuration (i.e. with ESBs) and the separation of Microservices/BoundedContexts likely aligns with a separation of codebases, it gets hard to tell "what the system does" from reading the sourcecode. To me, this is what drives the system from complicated to complex. The relationship between cause and effect can no longer be perceived beforehand (At least from the perspective of a developer working on a single BC).
I don't think that EDA is bad per se, but we should think about how to apply it in order to minimize accidental complexity.
Re: Driving a complicated system complex
As event subscriptions are often part of infrastructure configuration (i.e. with ESBs) and the separation of Microservices/BoundedContexts likely aligns with a separation of codebases, it gets hard to tell "what the system does" from reading the sourcecode.
In the example given in the article, wouldn't the programmer working on BC of Order Creation know that his order status is dependent on the message received from Customer Aggregate (BC) and would then make sense of the entire business flow. IMHO, it is all the more better that the programmer working on Order Creation BC is agnostic of Customer Credit BC and is just concerned with order creation and its status based on certain events.
Would love to hear from you if you have difference of opinion or if I have missed a point or a two.
Re: A hell of a lot to give up, on highly dubious rationale
The article is skirting the issue of transactions, giving the impression that an asynchronous event processing model will make up for transactions...
... but if someone were to re-charge the account between raising the CreditLimitExceeded and the order changing its status to CANCELLED, you still get a somewhat inconsistent behavior, which would get worse, the more inter-dependent events there are in an "order". One wouldn't get this with transactional isolation and locking... not to mention additional complexity resulting from no rollback (of a shared transaction) across failed components. But... it's good enough, meaning it likely won't be a big issue and the customer would send the order again. It works for many use cases.
The success of a microservices architecture, be it Corba (sic), WS or REST, is a result of good granularity decisions. Poor decisions will decrease some development complexity but result in overall solution complexity and brittleness, such as requiring a DW for cross-aggregate queries etc.