Mistakes and Recoveries When Building an Event Sourcing System

When Nat Pryce and his team started building a new system based on an event sourced architecture, they made a couple of significant mistakes in the design, but managed to recover from these mistakes with an ease that surprised them. In a blog post, Pryce describes the mistakes they made and the factors that made it possible for them to refactor the architecture and recover from their mistakes.

Their first mistake was persisting the event history together with persisting a view of the current state of the corresponding entity. The current state was not a projection and updated from the events, but rather was updated by the command handler that recorded the events. This introduced two problems: the state of an entity could not be rebuilt from the recorded events, and managing migrations in the relational model they used for the current state proved to be a significant overhead.

Pryce, co-author of the book Growing Object-Oriented Software Guided by Tests, admits that keeping the two persistence mechanisms together was in a way missing the whole point of event sourcing. The reason for the mistake was that they came up with a design that the team was comfortable with, without reflecting over the mismatch when comparing with recommendations in the event sourcing literature. They continued with this design until the difficulties clearly outweighed the benefits. They then had a technical retrospective and agreed on moving to a canonical event sourcing design.

Their next mistake was a confusion between event-driven and event-sourced architecture. In an event-driven architecture, components perform tasks in response to received events, and emit events to notify about changes in state. In event sourcing, state changes are recorded as events and the current state of an entity is calculated from all events related to the entity. This confusion led to a design where a component both recorded events in the history, and triggered activities in other components. They realized their mistake when they had to implement logic in events to distinguish between a) reading an event and reacting on it, and b) reading an event to know what happened in the past.

This confusion also led to a design where they used the event store as a message bus. They started to emit notifications to enable for components to keep their projections up-to-date, which meant they used the event store both for the event history and for transient communication between components. This gave them an event store that included technical events that had to be filtered out of the history before being displayed to consumers.

The last mistake Pryce describes was the usage of an HTTP interface for reading and storing events in the event store. This prevented the team from processing events in ACID transactions, instead forcing them to build other mechanisms in an attempt to mitigate this.

Fortunately, they discovered all their mistakes early on, before the event history was affected in their live system. The HTTP interface was replaced with direct database connections for their command processors. They stopped using notifications and went back to using REST for passing data between components. Finally, they moved away from updating entities’ current state in command handlers. Instead, the state is computed from the event history when an entity is loaded. They still use a projection of the current state of events, but this is seen as a read-through cache, purely for optimization.

Pryce concludes by noting that although they made significant changes to the architecture, the changes were straightforward, and he points out that the reasons for this are orthogonal to event sourcing.

The application has a ports and adapter architecture which makes it easy to change an implementation when it’s hidden behind a port or an adapter interface. They have extensive functional tests, written to take advantage of the ports and adapters architecture. This way, the technical architecture is segregated from the implementation of the functional behaviour, and this simplified when the changes of the architecture were made.

For Pryce, it’s inevitable that mistakes will be made when you adopt an unfamiliar architecture style in a system. He believes though that the ports and adapter style allowed them to adopt event sourcing despite their lack of experience, but also to recover from their misunderstandings when building the system.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the event sourcing topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter