QCon London 2026: How To Run on Three Clouds at Once, and When Not To

At QCon London 2026, Kevin Holditch and Ross McFarlane from Form3 walked the attendees through the reality of running a payments platform simultaneously across AWS, Google Cloud, and Azure. The talk was equal parts war story and cautionary tale: a candid look at what it actually takes to go active/active/active multi-cloud, and why that approach doesn't always fit.

Form3 processes account-to-account payments for major UK banks, handling billions of pounds in annual transaction volume. The push toward multi-cloud began in 2021, when the UK's banking regulator raised concerns about cloud concentration risk, the idea that too many financial institutions depending on a single cloud provider could create systemic vulnerabilities. One of Form3's largest banking customers responded by demanding a multi-cloud strategy, and that requirement cascaded down to Form3.

Their original architecture was deeply coupled to AWS, relying on ECS, SQS, and RDS. The team had intentionally embraced that coupling when they were a handful of engineers who needed to ship fast. However, the new requirements forced a wholesale rethink. The V2 platform they built runs Kubernetes clusters independently in each of the three clouds, connected via private network links. They chose NATS JetStream as a cross-cloud message broker and CockroachDB for distributed data storage, both selected specifically because they could operate as single logical clusters spanning all three environments. The team also migrated from Java to Go for their microservices, citing smaller deployment footprints and better readability across repositories.

Holditch highlighted three engineering challenges that proved especially stubborn. Bootstrapping CockroachDB across independent Kubernetes clusters required a clever DNS hack: the team invented a pseudo-suffix scheme that inserts the cloud name into Kubernetes DNS addresses, with forwarding and rewrite rules to route queries between clusters. To protect the database quorum during node maintenance, they built a custom operator called XPDB (cross-cluster pod disruption budget) that enforces disruption limits across all three clouds rather than within each one individually. And a painful day-two problem, keeping node pools updated across multiple clouds, environments, and geographies, led them to build the Cluster Lifecycle Operator, which consolidated hundreds of pull requests into one per platform.

The payoff came during a major Google Cloud outage last summer. Holditch described checking his laptop and finding only a low-severity alert about some crash-looping pods in GCP, while payments continued to flow through the other clouds without interruption.

But the second half of the talk took an unexpected turn. When Form3 expanded into the US market, McFarlane explained that their state-of-the-art triple-active setup fell flat. American customers expected geographical resilience, East Coast primary with West Coast disaster recovery, and found the multi-cloud pitch unfamiliar. Latency was also a hard constraint: spreading CockroachDB quorum across the continent would burn through SLAs on every write.

So Form3 stepped backward. They built an active-standby architecture with AWS on the East Coast and GCP on the West Coast, relying on backup-and-restore rather than real-time replication. Their first real incident came just two weeks after go-live, when an AWS outage knocked out their VPN connection to the payment scheme. The team debated failing over but ultimately waited for AWS to recover, the right call in hindsight, though McFarlane admitted it didn't feel like it at the time.

They're now working to close the gap: adding CockroachDB logical replication between clouds and replicating NATS event streams to the standby site, which should dramatically reduce recovery time. They're also building per-customer failover capability so individual tenants can rehearse disaster recovery without disrupting others on the shared platform.

Holditch closed with three pillars that made multi-cloud work in the UK: cloud-agnostic technology choices, single logical data stores across clouds, and treating each cloud provider as an availability zone. But he was equally direct about when not to bother. If your market doesn't value it, if your budget can't sustain it, or if you lack a strong platform engineering team to run it, triple-active multi-cloud is probably not worth the effort. As he put it: "bankruptcy is kind of incompatible with uptime."

About the Author

Steef-Jan Wiggers

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Steef-Jan Wiggers

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter