A Discussion with Allard Buijze on CQRS with the Axon framework
The Axon framework is a Java implementation of the Command and Query Responsibility Segregation (CQRS) pattern, which - as the Axon Framework website very capably describes it - is "an architectural pattern that makes a clear distinction between command processing – telling an application what to do – and query execution – which provides insight in the state of the of application." Shortly before the releasae of the 0.4 iteration of the Axon framework InfoQ talked with its creator, Allard Buijze, to find out more.
InfoQ: Can you - generally describe what Axon is, and the reason it was conceived?
Allard Buijze (AB): CQRS is an architectural pattern that makes a clear distinction between two parts of an application: one that processes commands, and one that executes queries on data. We make this distinction because there are significant differences in the (non-functional) requirements for the two. The command processing part of the application is responsible for the validation and execution of commands. The command area does maintain information about the state of the application, but it does not expose that state. Instead, it exposes events that reflect any changes that occurred. These events are picked up by event listeners that update query data sources or do further processing of data based on these events. Implementing a CQRS architecture requires quite a lot of "plumbing" and boilerplate code. Much of that code is project-independent and quite error-prone to do right. Axon aims at providing the basic building blocks, which are needed in most CQRS-style applications. Event dispatching, for example, is typically an asynchronous process, but you do want guaranteed delivery and, in some cases, sequential delivery. Axon provides building blocks and guidelines for the implementation of these features. Besides these core-components, Axon wants to make integration with third party libraries possible. Think of frameworks and libraries such as JMS, MQ and Spring Integration.
InfoQ: Can you describe a simple solution and the actors involved? i.e., what classes (or concepts) would I use to build a solution to add an address to an address book?
AB: Let's assume that we have an address book that allows us to store addresses for friends and relations. But to make it more functional, we'll implement a feature that allows you to change the address of each entry.
First, we need a component that will receive commands to insert entries, edit them and, (these things happen) remove them again. A simple (but efficient) solution is a thin service layer. Another option could be an implementation of the command pattern. Let's choose the service for now and implement the "add entry" feature. The service implementation will then create an address book entry object ("AddressEntry") and send it to a repository. This repository will persist this entry and tell an event bus to asynchronously dispatch an "AddressEntryAddedEvent" to other parts of the application. The client will receive a confirmation that the address was changed. The repository that persisted the entry does not provide any means to query for information it stores, other than by the identifier. This allows us to choose a storage that is very much focused on storing object trees and retrieving them by ID. There are tons of solutions that do this perfectly well that scale a lot better than the traditional DBMS: NoSQL databases, distributed caches, etc, etc. Even plain old file system will sometimes provide better performance than a DBMS. An address book is useless if you can only search by ID. Therefore, we need a way to query this information. In CQRS, this is a completely different component than the one that deals with the commands. This is where the event comes back into the story. An Event Listener will pick up events regarding
"AddressEntry" related events and store the changes they represent into a DBMS. This time we might choose for a solution like MySQL or PostgresSQL, as we want to provide different search and query possibilities. Now, we have a component that processes your commands: our service, one that keeps track of the actual state of our entries: the repository and one that updates our query database: the event listener. But there is still no way to present this information to any front end of our choice. The last component we need is a data access component. A very thin component only queries for data. This data layer will return (typically read-only) DTOs to the frontend containing the query results.
Since a picture says more than a 1000 words, here is a general overview of the components involved:
InfoQ: Why would somebody choose this technology over the more traditional anemic domain model used in tandem with services?
AB: CQRS does not exclude the use of an anemic model per-se. If the logic in the command-handling component is very straightforward, it is possible implement that logic using the "transaction script pattern", where business logic is contained within the services. The same pros and cons apply as in the general rich versus anemic domain model discussion. Where this technology really differs from the traditional implementations is the separation of query logic and command logic. The command area of the application is responsible for maintaining a view on the current state of the application such that it can do validation on incoming commands. Any changes to this model are exclusively exposed through the events published at the moment this information changes. The biggest advantage of this architecture is that it allows you to design your model to meet specific requirements. Typically, the requirements for changing data are significantly different than those for querying data. A typical example of the friction between these requirements can often be seen in the size of a lot of SQL statements. SQL optimization is often used to squeeze the best performance out of a data source that is not really designed for executing that query.
Another advantage is application extensibility. In the more traditional layered architecture, services often need each other to do their job. Especially in long-running projects, the result is a big tangled mess that only the developers know how to maintain. CQRS enforces loose coupling between components through events. How do you control the model / granularity of the events published by the updates made to the repository? If I send a command that updates a customer's credit card number and nothing else, does a customer update event get published? Does the event include the updated credit card data? Or, is it reasonable to just publish a customer-updated event and expect clients to figure out that the only delta between the old customer and the new one is the updated credit-card number?
In a good CQRS implementation, all changes result in an event. Typically, commands and events exist in pairs. So, for a
UpdateCreditCardNumberCommand, you will have a
CreditCardNumberUpdatedEvent. This event must contain the data that has changed, since there is no other source of this information. This means that the event will contain the identifier of the customer whose number has changed as well as the new (and maybe old) credit card number.
One of the components might just store this new information in a database, while another might start a transaction to validate the credit card.
In some cases, just being notified of a change is not enough. You want to know "why" information has changed. An address change, for example, could be a typo fix or a recording of someone that actually moved to another address. In the latter case, you might want to trigger an automatic process of sending a postcard to congratulate them with their new home. In the first case, that would not really be the case.
That means that both the commands and the events should indicate the intention of the change. You would then model the events differently. You will have an abstract class that defines the fact that an address changed, and two concrete subclasses of this class that indicate the different intentions of this change. Event Handlers that do not care about intentions, such as one that updates database tables, could focus on occurrences of the abstract event type.
InfoQ: We've all heard about Staged Event Driven Architectures (SEDA), and understand the value of buffering processing until such time as the downstream components can catch up, but it looks like here we're buffering updates to the store of record itself. So, the DB is never over-loaded, which is desirable, but how do you handle the situations when you need synchronous results? Suppose, for example, a system in which update a customer's bank account balance at 2:05 PM, and another client loads a query at 2:07 PM. Let us further suppose that the first update - at 2:05 PM - has not propagated to the store of record (as it is still queued). Thus, events notifying clients of the new state haven't been issued yet. Does this signify an inconsistent read? Or, does CQRS imply that the queries be against the audit-trail created by the events, and not a store of record?
AB: When you think of it, we've been fooling ourselves for a long time. We made ourselves and our application users believe that they are the only one using an application. However, the real world is different. There are multiple users acting on the same data (either willingly or not) and the data each of those actors bases their decisions on is stale. These two facts, collaboration and staleness are the driving forces behind CQRS. CQRS acknowledges the fact of the staleness and leverages it to its advantage. A side effect could be that the window of staleness is increased a little bit. However, this only goes for the query part of an application. The command part is always up-to-date. If multiple commands want to act on the same aggregate, they will have to wait for each other. In that case, it is up to the front-end to decide whether it wants to wait for the command execution result or not.
Most commands are recordings of a fact that happened in the real world that needs to be reflected in the application, such as an address change. All a user is really interested in, is whether the system has received the address change command and that it will process it. There is no problem is this process takes 24 hours. Many companies will thank a user for the information and send an email confirmation when the command is processed.
Validation of the command, however, is another story. If the house number field is missing, for example, you want to be able to give the user quick feedback. In the example, the querying user would just read stale data. In this case, quite stale, as it is over 2 minutes old. Nevertheless, is that really a bad thing? If the first user entered his data two minutes later, we wouldn't be thinking about this as a problem at all. Actually, you might argue that you can't ever get a consistent read, at all. When data is presented on screen, other users might have acted on that data in the mean time.
Some rich clients, for example, might be developed in a way that they can process incoming events directly on-screen. That would reflect all changes to the model quite directly (though still with a slight delay) on a users screen. Stock broking applications do a similar thing. Even there however, the actual stock price is never shown on screen. It is always stale.
AB: At this moment, the framework is focused on providing building blocks for within a single JVM. The next step, though is to provide simple configuration options and building blocks that allow implementations in different JVMs to keep each other updated. Since we have acknowledged the fact that data is stale, we can just use the existing events and commands and publish them to other JVM's using existing technology. The choice to provide Spring Integration support in an early stage was mostly motivated by time constraints. Spring Integration already has connectors to a lot of messaging implementations, and connecting to Spring Integration is quite simple. In the future, Axon will likely contain connectors for other event and message related technologies, such as Mule, Apache Camel, MQ and JMS.
InfoQ: You describe easy integration as being a key benefit. The description implies that a hub-and-spoke architecture makes it easy for partner applications to have 'views' of the data, that they build up and optimize for their use case. Do you see Complex Event Processing as being useful for this sort of thing, to expose the patterns in the messaging as aggregate events?
AB: I think you hit the nail right on the head there. Integration is made easier because of the decoupling in the application. If you need data for the integration, you can maintain a separate data source that contains the data in exactly the way you need it. Complex event processing is an integration example I actually use quite often. It can be implemented in a completely non-intrusive way. Take fraud detection as an example. You could have a component that just listens to all events in an application. If a certain pattern of events occurs, it could send out a command to the application to block a certain user account.
InfoQ: What are some real-world use cases you can think of that speak to the value of this architecture? Can it be applied, for example, in eventually consistent architectures like e-Bay's which deals with many, non XA transactions across sharded databases?
AB: There are a few types of applications that would benefit from this architecture. However, some of the principles and ideas of CQRS would benefit any medium to large application in the end.
An application that offers views on the same data is one such group. CQRS uses events to update query data. It doesn't matter if there is one table showing this data, or hundreds, each event listener component will keep its own data source up-to-date. Examples of views on information are over view pages, search engine indexes, reporting information, "what happened last month" emails, ad-hoc email notifications, etc, etc. LinkedIn is an example of such an application.
Another example is applications that are likely to be extended in the future. Since events are ubiquitous in any CQRS application, it is very easy to create add-ons that act upon these events. Changes are typically a lot less intrusive than with the more traditional approach. Some web shop implementations, for example, need to start small due to budget restrictions. However, after some time, they might require an inventory feature that is updated automatically when an order is made. Perhaps some time later, a shipping department needs to be kept up-to-date about orders ready for shipping.
The last type of application is the one that needs to scale. The traditional layered architecture is not one that supports scalability very well. Transaction management is a very heavy process in a scaled environment. XA transactions ask for a big price to pay on each transaction, while only one-in-many transactions actually go wrong. CQRS uses asynchronous updates through events. If any conflicting event is found (e.g. an item was bought, but the item is not in stock), we need to fix that in a compensating transaction. It is effectively the ACID vs. BASE discussion. It really is something that we (as developers) have to get used to. We have to educate ourselves, and our customers that things just go wrong from time to time. Instead of presenting the error to the user (with the default "please try again later" screen), we try to solve it in the background.
How information is queried, such as in the sharded databases you described, is really up to the developers to decide. With CQRS, you can make a different choice for different parts of your application. All these different parts are updated by the same source: the events.
The primary goal of Axon is to make a CQRS implementation easier to set up. We do this by providing basic implementations that take care of event dispatching, and basic abstract implementations of some commonly used components in such architecture. Examples of the latter are basic implementations for repositories that use event sourcing or event handlers that update information in a relational database.
InfoQ: Anything else you want to add? Am I missing something important you feel should be added?
An additional feature of CQRS that I personally like a lot is "event sourcing". In a more traditional layered architecture, audit trails are typically implemented as a side effect of the actual changes they describe. This makes an audit trail very error prone and thus unreliable. Event sourcing uses a different approach: let your event be the origin of all changes. In other words, changes will occur as a result of an event.
In practice, this means that the methods on an aggregate (the term as defined by Domain Driven Design, Eric Evans) will not change its state directly. Instead, they will generate en event that is then applied on the aggregate. When the aggregate is stored in the repository, the generated events are dispatched to the rest of the application. In such case, the repository will not store the aggregate itself, but just the latest (uncommitted) events. The event stream that results is the overview of all changes that ever occurred on the aggregate. If a repository needs to rebuild an aggregate from storage, it will replay all past events. The opportunities that event sourcing provides are not limited to audit trails. You could also use these events to set the application to a certain state in time and replay events back on components for debugging purposes. Axon contains building blocks that make it easier to use event sourcing. There is an abstract implementation of an aggregate that you can extend with the case-specific business logic and a basic implementation of a repository that stores events in the file system.
Really interesting stuff! A few questions...
first of all great work and congrats on the 0.4 release. I really like the idea of this event based architecture a lot. I do have a question, I was wondering how you typically handle errors that occur while processing event in components. You described in the article that developers have to get used to dealing with this situation, but does Axon helps you there in any way?
In the past I've been writing a lot of Swing apps where events are all over the place. It`s not exactly comparable with Axon. But what is your experience with applications using CQRS, do they tend to become a bit magical? Because all sorts of events and processing is done in the 'background'. How do you keep track of whats going on in an application?
Re: Really interesting stuff! A few questions...
interesting questions. CQRS applications, when well built, allow a clear separation of contexts (as defined by Evans, DDD). From within each context, events are published to the entire application. Of course, this will lead to quite a large amount of events passing through. That might sound frightening if you're involved in the eternal battle against complexity.
Fortunately, this situation is not nearly as complex as it seems. First of all, events are clearly defined and reveal the intention of the change they represent. Second, contexts only have to take action upon events that are interesting to them. If a context is interested in events from another context, that is probably clearly defined in some requirement documentation. If you don't care about a certain event, you won't even know it exists at all.
Personally, I have noticed that a lot of requirements are described in an event-driven (when-ten) way. For example: "When an Order is approved, then we want to send the purchaser a confirmation email". This indicates that there must be an Event Listener listening for "OrderApprovedEvent" which sends emails in the application.
What you describe as "processing in the 'background'" is not really something I personally experience. True, there is some separation on a place where "regular" architectures don't have one: between the command processing and the query database. That is something you will very quickly get used to when implementing an application with CQRS. I see it as a clear separation of concerns.
With asynchronous event processing, you will have to make sure your client is "aware" of the fact that changes are not visible immediately. Axon does not explicitly provide building blocks that allow you to do that. There are, of course building blocks that allow you to process event asynchronously. But client-awareness has more to do with UI choices than anything else. Blogger, for example, uses a user's session to store comments that user added for a specific blog. That way, it looks like the comment is placed immediately. But any other user will not be able to see that comment at that time.
Asynchronous event processing (and eventual consistency) is a choice you can make if you want to be able to scale out. With the Axon Framework, it is very easy to implement either synchronous processing or its asynchronous counterpart, or even switch between them.
When you use eventual consistency, you don't have a mechanism anymore to directly notify the user that something went wrong. Raising an exception will not get you very far. Instead, you need to take some countermeasures. If you sold an item that seems to be sold to another person at the same time and it is no longer in stock, you can raise a "ItemOversoldEvent". This event could trigger a listener to send an email to a sales representative to ask him to contact the customer. So instead of "bugging" the customer with your problem, you send it to the business to solve it. They can then take a decision on how to solve the problem. Perhaps they decide to wait for the new shipment of items, or they call the customer to ask what he wants to do.
If errors are technical (e.g. database is down), Axon Framework provides mechanism to retry event processing until the database is back up. Axon makes a clear distinction between transient exceptions (which can disappear without changing application logic) and non-transient ones. The latter are cause by programming mistakes and will cause events to be discarded. Since event logs are still available, you have all information available to debug.
I hope this answers your questions.
I find your post very interesting, and complementary to "Unleash/Unshackle your Domain" by Greg Young.
I understand your point about separation of DB for commands (writes) and requests (reads).
But one point is still unclear to me : the query DB is at the same time readable and writable, since it is updated by some events.
Should we separate this DB further (which leads to infinite recursion) ?
I guess the answer is no, but why separating the primal storage in two storage (one for write and one for read), and why not separating the "read" storage, since it is also writable ?
Sorry for my poor english, but I think you will get my point.
I surely missed someting.
Needs a shorter description
Re: Query Databse
the intention of the query database is completely different than that of the command database, and so are the requirements for them. The query database is meant for storing data to be used (typically) in the UI. That means data is optimized (read: denormalized) to fit the information needs on the screen. The command database stores the relevant parts of the current state, so that you can validate incoming commands and see what state changes as a result. That's a very important distinction to make.
If you just split a storage into two (one for read and one for write), you'll probably end up with the same data model in both. But those two different usages ask for a different model altogether. The "query model" looks completely different than the "command model".
CQRS is not purely about splitting a database in two for performance reasons, it is about splitting the model in *multiple parts* for both functional and non-functional reasons. And by splitting the model in several parts, we have the freedom to choose the best suitable storage for each of those parts.
Re: Query Databse
I guess there is no way we could magically get rid of locks on storage that is both read and written ; all we can do is optimize each storage according to its intent.
Things to persist in the write-side
After reading the interviews I have the following questions:
1. What's to persist on the write-side ? Is it the aggregate root entities, non-root entities or the series of events ?
2. Related to #1 above, do we still need to store all the properties of entities in the write-side just like before using ORM mapping ?
Thanks & Regards