Key Takeaways
- Event-driven architecture is neither a shortcut nor a free win. It introduces new forms of complexity, new failure modes, and a fundamentally different way of thinking about system design.
- In highly regulated environments, reliability patterns such as inboxes, outboxes, idempotent consumers, and explicit fault handling are not optional. They are essential if you want to avoid lost or duplicated events.
- Separating domain events from integration events helps protect internal models from leaking across boundaries, and gives systems room to evolve independently without breaking consumers.
- Event-driven systems can offer real operational benefits: strong decoupling, natural audit trails, and the ability to add new capabilities by subscribing to existing event streams rather than modifying core platforms.
- Successful adoption depends as much on organisational investment as it does on technology. Shared standards, strong developer platforms, and hands-on training all matter.
When discussing event-driven architectures in the context of cloud platforms and highly regulated industries, it helps to start with a shared foundation. This topic attracts people with very different backgrounds, from engineers who have lived through the realities of distributed systems to those encountering these ideas for the first time. That shared foundation matters because event-driven architecture introduces concepts that sound simple at first, but are easy to misuse if they are not clearly understood from the outset.
This article is based on my session at InfoQ Dev Summit Munich 2025 and reflects lessons learned from building and operating event-driven, cloud-native systems in banking. I will start with the fundamentals, then move into why this architectural style is attractive in a regulated environment, what it can realistically deliver, and where it introduces risk and complexity. I will also cover the patterns and practices that helped us make these systems work in production, not just in theory.
With the fundamentals in place, it becomes much easier to understand not only what event-driven architecture is, but when it is appropriate, what trade-offs it brings, and why it can still be the right choice for banking systems despite the constraints that regulation introduces.
What is an Event?
An event is a state change that occurs somewhere within a system. That change may be triggered by a user action, an asynchronous background process, or an external system interacting with the platform. Events can carry data that describes what happened, or they can act as lightweight notifications that simply signal that something has occurred.
There is ongoing discussion in the industry about how much data an event should contain. Some advocate for very minimal payloads, while others prefer richer messages. My view is fairly simple: an event should include the data that directly relates to the state change it represents, and nothing more. If a piece of information is not inherent to that state change, it probably does not belong in the event. Keeping events lean makes them easier to evolve and reduces unnecessary coupling between systems.
Commands and Events Are Not the Same
One of the most important distinctions in event-driven systems is the difference between commands and events. It is also one of the most commonly misunderstood. When teams blur this line, they often end up with architectures that look event-driven on the surface but do not deliver the benefits they were aiming for.
A command is an explicit request for action. It is intentional and directional: one system tells another system to perform a specific operation. Even when handled asynchronously, a command still carries the expectation that something should happen as a result.
An event is different. It is a statement of fact. It says that something has already happened. The system publishing the event is not asking for further action and is not expecting a response. There may be multiple consumers, a single consumer, or no consumers at all. All of those outcomes are valid.
That distinction matters. Treating events as commands in disguise creates tighter coupling and erodes long-term flexibility. If you want systems to remain adaptable, you need to be deliberate about when you issue commands and when you publish events.
What Event-Driven Architecture Actually Means
With those definitions in place, event-driven architecture becomes easier to describe. It is an architectural style in which systems communicate by publishing and reacting to events. Producers emit events to an eventing platform, and consumers subscribe to the events they care about. The producer does not know, and does not need to know, who consumes those events, or whether anyone does at all.
Event-driven architecture is often confused with event sourcing, but they are not the same thing. Event sourcing is a specific way of modeling application state as an immutable sequence of events rather than storing the current state directly. For example, instead of recording that a shopping cart contains four items, an event-sourced system stores four separate "item added" events and derives the current state by replaying them.
Event sourcing can be powerful, but it is also complex and difficult to implement well. More importantly, it is not a prerequisite for event-driven architecture. The two are often discussed together because event-sourced systems naturally expose events, but adopting event-driven communication does not mean taking on event sourcing. That is a separate decision, with its own cost.
Cloud Native in Practice
Cloud native is not simply about running workloads in the cloud. The cloud can support almost any operating model. Cloud-native systems, by contrast, embrace modern engineering practices. They are designed to scale, to be deployed frequently, and to be operated through automation rather than manual intervention.
These systems are commonly built using microservices, but modular monoliths can work well too. The structural choice is not the important part. What matters is the adoption of modern DevOps practices such as continuous integration, continuous deployment, and infrastructure as code. In that context, event-driven architectures are a natural fit. They support asynchronous communication, loose coupling, and independent deployment, which aligns well with how cloud-native systems are built and operated.
Banking as a Constraint
Banking adds a constraint to this picture. Banks are large, highly regulated organizations with a fundamental responsibility to protect customer funds. Regulation shapes not only technical decisions, but also organisational culture, risk tolerance, and the pace of change. It is not surprising that this often leads to caution around newer architectural patterns.
At the same time, modern engineering practices are no longer optional. The scale and complexity of financial systems continue to grow, along with customer expectations. At Investec, an Anglo-South African international banking and wealth management group, we have deliberately adopted cloud-native and event-driven approaches while still meeting strict regulatory requirements.
Why Event-Driven Architecture Matters
Once the foundations are clear, the value of event-driven architecture becomes much easier to see. The benefits are not theoretical. They show up directly in real production use cases in banking.
Decoupling is one of the clearest examples, especially in payment processing. Consider transaction monitoring. Monitoring account activity for suspicious behaviour is essential, but it should not sit in the critical payment execution path. Payments need to be highly reliable. Monitoring can happen asynchronously. By publishing payment events and allowing monitoring systems to consume them independently, the two concerns remain isolated. This means payments can continue even if monitoring is temporarily unavailable, and the monitoring capability can evolve without destabilising the payment flow.

Figure 1: Decoupling in a banking system
Event-driven systems also create an immutable activity log. In complex payment flows, events become the authoritative record of what actually happened, not just a secondary audit trail. That activity log gives teams visibility into payment lifecycles, improves troubleshooting, and helps meet regulatory expectations around traceability.
Fan-out is another strong advantage. A single event, such as a completed payment, can trigger several independent processes: updating payment limits, sending customer notifications, or starting downstream reconciliation flows. Each consumer can manage its own failures and retries independently, which keeps the core payment flow simpler.

Figure 2: Fan-out in a banking system
Fault tolerance is especially important in regulated environments, particularly when unreliable external dependencies are involved, such as third-party fraud engines. Event-driven architectures support layered retry strategies, controlled back-off, and dead-lettering when automatic recovery is no longer possible. That stops poisonous events from destabilizing the wider system and allows failures to be handled more safely.
Finally, mature event-driven platforms enable genuine plug-and-play capabilities. New features, such as rewards programs, can be built by subscribing to existing event streams rather than modifying core systems. When events are designed well and kept stable, you can add capabilities quickly without tight coupling to the systems that originated them.
What Hurts — and What Actually Helps
Event-driven architectures can look close to ideal in theory, but reality catches up quickly. There are real pain points, and in highly regulated environments such as banking, ignoring them can have serious consequences. The good news is that these challenges are well understood, and there are proven ways to address them.
The Human Challenge
The biggest challenge with event-driven architectures is not technical. It is the change in mindset. Engineers need to think in terms of asynchronous communication, eventual consistency, and independent fault handling rather than defaulting to familiar synchronous request-response patterns.
We saw this very clearly in practice. In one area where we introduced both event-driven architecture and event sourcing, new team members typically took around six months to deliver at the same pace as more experienced colleagues. Early on, teams often over-engineer problems that are no longer central, while underestimating the issues that matter most in distributed systems: consistency, retries, and failure handling. The organisational impact of these challenges is significant and should not be overlooked.
Tooling and Enablement
Addressing the human challenge requires more than good intentions. It requires deliberate tooling and enablement. Investing early in a strong developer platform, with well-designed templates and shared modules, makes a material difference. Patterns such as paved paths help new teams adopt event-driven microservices without repeatedly reinventing the wheel.
Even a strong platform is not enough on its own. Training matters. Giving teams powerful tools without teaching them how those systems behave in production is risky, especially when failures like to happen at 2 am. To address this, we paired enablement teams with delivery teams so they could design and build small, practical event-driven systems together, and then take them into production. That hands-on approach was far more effective than relying on documentation alone. It gave teams the confidence to build independently.
Early alignment on standards is equally important. Agreeing upfront on event contracts, permissions, and core technologies helps avoid fragmentation and makes consuming events across the organization far more straightforward.
Protecting Against Lost or Duplicate Events
Another major challenge is reliability. In banking, losing events or duplicating them is simply unacceptable. Missing a rent payment or paying a deposit twice is not a minor edge case. These risks need to be addressed in the design itself.
We rely on outbox and inbox patterns to reduce those risks. The outbox pattern ensures that state changes and event publication are captured within the same transactional boundary, which prevents lost events. A dispatcher can then publish those events reliably to the eventing platform.

Figure 3: Outbox and inbox patterns
Outboxes alone do not solve duplication, because eventing systems often guarantee only at-least-once delivery. The inbox pattern on the consumer side addresses that problem. Each event is recorded before business logic runs, and duplicates are ignored. Together, these patterns protect against both loss and duplication without forcing every engineer to solve the same reliability problem repeatedly.
Event Contracts Are Permanent
Events help decouple systems, but they also create durable contracts. Once an event is published, it can persist indefinitely and may be replayed from any point in time. Removing or changing fields in an event can break consumers with loud errors, or (worse) quiet regressions in their handling logic. Rewriting historical events to patch over mistakes undermines the architecture and should be avoided.
The safest way to think about events is as public APIs. Define them carefully, on the assumption that consumers will use them in ways you did not anticipate. Breaking changes should be avoided wherever possible. When they are unavoidable, events should be versioned. Including a version identifier in metadata allows consumers to handle multiple versions safely and replay streams without disruption.
Separating domain events from integration events adds another layer of protection. Domain events, which are specific to a bounded context, can evolve more freely. Integration events, which cross contexts, need to stay stable and well-defined. That separation helps prevent internal concepts from leaking into external contracts and becoming painful to change later.
Event Ordering Requires Attention
Event ordering is a subtle but important concern. Most cloud-native eventing platforms prioritize scalability over strict ordering. Retries, back-off, and parallel processing can all result in events arriving out of sequence. In some domains, that is harmless. In others, it is a serious problem.
There are two effective ways to approach this. The first is explicit ordering, where events carry a version or sequence number tied to an aggregate. Consumers then enforce ordering through inbox logic, processing events only when earlier events have already been received. This works, but it can reduce scalability.
The second is implicit ordering, where the domain itself enforces valid state transitions. For example, a payment cannot be processed until the associated beneficiary exists. Implicit ordering preserves correctness without explicit sequencing, and it often scales better. Both approaches are valid. The key is to make ordering an intentional design decision, rather than discovering in production that it mattered more than expected.
Bringing It All Together
The event-driven architecture shown combines domain and integration events with inbox and outbox patterns to create systems that are resilient, scalable, and auditable. Every meaningful state change is captured as an event and published reliably alongside data persistence, while consumers process events through inboxes to ensure idempotency. That design helps prevent both lost and duplicated events, which is essential in a regulated banking environment.

Figure 4: Complete architecture
Separating domain events from integration events has been particularly important. Domain events are free to evolve within bounded contexts, while integration events act as stable, versioned contracts between systems. By filtering, aggregating, and transforming events at those boundaries, we avoid leaking internal domain concepts and allow systems to change independently without breaking consumers.
The decoupling also provides tangible operational benefits. Payments can continue even when downstream services are unavailable, and new capabilities can be added by subscribing to existing event streams rather than modifying core platforms. Events also form a natural audit trail, giving end-to-end visibility into transaction lifecycles and supporting both troubleshooting and regulatory requirements.
Embedding these patterns into a shared developer platform has enabled more consistent adoption. Service templates, shared modules, and built-in reliability patterns allow teams to focus on business logic instead of repeatedly solving infrastructure concerns. At the same time, training ensures engineers understand how these systems behave in production, not just how to scaffold them.
Conclusion
Event-driven architecture is neither a shortcut nor a free win. It introduces new forms of complexity, new failure modes, and a fundamentally different way of thinking about system design. But when it is applied deliberately, with clear event contracts, proven reliability patterns, and strong support for engineering teams, it becomes a strong foundation for building modern banking platforms.
In our experience, this approach has allowed us to meet strict regulatory requirements while still embracing cloud-native agility. It supports systems that are resilient, extensible, and transparent. Done well, event-driven architecture is more than a technical decision. It is an operational and organizational commitment that shapes how teams build, run, and evolve critical systems.