BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Inside Google’s System for Coordinated A/B Testing Across Its Global Service Fleet

Inside Google’s System for Coordinated A/B Testing Across Its Global Service Fleet

Listen to this article -  0:00

Google has detailed how it runs fleet-wide large-scale A/B experimentation across its services, describing an internal system designed to support consistent, reliable experimentation across products that operate at massive scale. The approach focuses on enabling teams to run experiments safely across a distributed infrastructure while maintaining statistical rigor and minimizing interference between experiments.

At its core, the system addresses a common challenge in large organizations operating many interconnected services: ensuring that experiments produce trustworthy causal signals when traffic spans multiple layers of infrastructure, user surfaces, and backend systems. As experimentation becomes more pervasive across product development, inconsistencies in assignment, overlapping experiments, and fragmented telemetry can degrade the quality of insights. Google's approach is designed to standardize experiment allocation and measurement across this fleet of services.

The system provides a centralized experimentation framework that coordinates how users or requests are assigned to experimental variants. Rather than relying on isolated implementations per product or service, Google uses shared infrastructure that manages experiment configuration, assignment logic, and exposure logging. This helps ensure that users are consistently bucketed into experiment groups even when they interact with multiple services or features participating in different experiments.

 

Infrastructure Experiment Process at Google (Source: Google Blog Post)

A key component is a unified assignment layer that determines how traffic is allocated across experiments. This layer supports hierarchical allocation, allowing experimentation at different levels of the stack while reducing conflicts between overlapping tests. It also ensures that the assignment is deterministic for a given user or session, which is important for avoiding contamination between variants and for maintaining stable experimental exposure over time.

To support correctness in measurement, the system emphasizes exposure logging that captures when and how users are actually exposed to experimental treatments. This enables downstream analysis systems to distinguish between assigned and truly exposed populations, improving the reliability of metrics. The platform also integrates guardrails to prevent experiments from exceeding configured traffic limits or violating safety constraints.

Google also highlights the importance of configuration propagation across its infrastructure. Experiment definitions are distributed to serving systems so that services can evaluate experiment state locally, reducing latency and dependency on centralized calls at runtime. This design supports high-throughput environments where real-time decision-making is required.

Anil Bhagavatula, Vice President @ digi edZe, in a LinkedIn post, highlights this approach as

The takeaway is that infrastructure experimentation involves more than just code adjustments; it requires a robust, statistically rigorous, and safe framework that treats the data center as a laboratory. 

The experimentation infrastructure is tightly coupled with analytics pipelines that aggregate results across services. This allows teams to evaluate the impact of changes not only at a single service level but across end-to-end user journeys. By standardizing both assignment and measurement, the system reduces the operational overhead for product teams and enables faster iteration cycles. By consolidating experimentation primitives into shared infrastructure, the company aims to improve both velocity and confidence in product decisions across its ecosystem.

About the Author

Rate this Article

Adoption
Style

BT