At the recent JEEConf conference in Kiev, Amitay Horwitz described how he and his team implemented an event-sourced invoice system, the challenges they experienced after running in production for 2 ½ years, and how they implemented a new design using Kafka Streams.
Horwitz, software engineer at Wix, started working with a new invoicing service together with his team in 2015 with the goal of helping their customers manage invoices and receive payments online. When designing the new service, they wanted to create a small and simple library that was non-intrusive, maintained data integrity, and with the ability to easily add custom views. To accomplish all goals, they decided to implement the service using an event-sourced architecture.
Despite the effort to create a simple design, the library became quite big. They were also experiencing problems where customers sometimes couldn’t see a newly created invoice because of the eventual consistency from write to reads – a request to create an invoice updates the write model with a new invoice but the following request reads from the read model before it has been updated and therefore the new invoice is not included.
Their biggest problem though, was rebuilding views. Ensuring that passed data was handled before new events when a new event handler was added, and triggering rebuilds despite no events coming in, proved to be more complex than they anticipated, especially in their distributed environment with events coming from various servers. These problems made Horwitz search for an alternative architecture, while keeping the benefits of event-sourcing.
Horwitz describes Kafka as a replicated, fault-tolerant, and distributed append-only log. It’s often used for pub-sub or as a queue, but he notes that it can do much more. The basic structure in Kafka is a topic, a logical queue, that is partitioned. A producer will push messages to each partition depending on the key in the message, and a consumer can then consume these messages. Two important features critical for an event-sourced system are that the ordering between messages is maintained within a single partition, and that messages can be stored even after they are consumed.
Kafka Streams add stream processing to Kafka. It has two main abstractions:
- Streams that Horwitz sees as data in flight, an unbounded ordered and replayable sequence of immutable data, which makes it interesting for an event-sourced system.
- Tables that he sees at data at rest. A table stores a point-in time view of aggregated data that is updated as new messages are received.
In the new design of the invoice service which uses Kafka, they have a snapshot state store that keeps the current state of each aggregate. After receiving a command from the command stream, the command handler reads the current state of the corresponding aggregate from the state store. The handler can then decide if the command succeeded or failed and return the result through a results stream. If the command succeeded, events are created and pushed to the events store. Next, the stream with the new events is read, and the aggregate in the state store is updated to its new state. He notes that the command handler logic can be written in a very concise and declarative way, in his example with just 60 lines of Scala code.
In the new architecture Kafka is in the centre, with microservices communicating with Kafka and between each other also through Kafka. They can also push information into Kafka or pull information out for instance when creating analytics reports. In conclusion Horwitz notes that the new design has given them several advantages:
- A simple and declarative system
- Eventual consistency is now embraced and handled gracefully
- It’s easy to add or change views
- Increased scalability and fault-tolerance from the use of Kafka
In an interview with InfoQ, Horwitz points out that the new design is still under assessment, although they use Kafka heavily in production. He notes that there are claims that Kafka is not suitable for an CQRS, event-sourced system, but he thinks that if you understand the trade-offs, Kafka can be leveraged. If you save page views events, with different properties about the client, you can easily create aggregations based on that information, and this is one form of event sourcing where he thinks Kafka is a great fit.
By using the identity of an aggregate as the partition key, all commands for the same aggregate will end up in the same partition in the commands topic and will be processed in order, in a single thread. This way no command will be handled before the previous one has produced all downstream events, and Horwitz notes that this will create a strong consistency guarantee.