Expo: Real Time A/B Testing and Monitoring with Spark Streaming and Kafka at Walmart Labs

The engineering team at WalmartLabs developed a real time A/B testing tool called "Expo" that collects and analyzes user engagement metrics. It uses Spark Structured Streaming to process the incoming log data and stores the metrics in KairosDB.

A/B testing measures the impact on user behaviour of changes to an app, website or algorithm compared to an existing version. The "variations" in an A/B experiment are the different types of the UI/algorithm exposed to the user. The version running on production is called "control". WalmartLab's Expo "splits the traffic between these variations" and captures user engagement metrics. The system can detect anomalous trends and false positives in the metrics as the experiment runs. Expo is built using Spark structured streaming jobs, Kafka, KairosDB and Grafana. It replaces a previous Lambda based pipeline which had performance issues during high traffic. InfoQ got in touch with Reza Esfandani, senior software engineer at WalmartLabs, to know more about the tool.

Expo works by processing logs (server logs, beacons) that are forwarded to a Kafka cluster, with each user in the log being identified by a session id. Jobs run every minute as apps in Spark that use structured streaming to aggregate log events and generate metrics in two phases. A copy of the job runs in another datacenter for resiliency. The jobs write the generated metrics to KairosDB, a time series database backed by Cassandra. The system also pushes its own health metrics into KairosDB, which are used by Grafana dashboards. Esfandani describes the scale at which this runs:

Each instance of the job is currently running on 180 cores. We are getting data at the rate of 30k-60k per second and pushing metrics to Cassandra at the rate of 1-3 million records per minute.

The Spark jobs - written in Scala - aggregate the actions of each user into a "session". Streaming frameworks like Apache Spark and Apache Flink allow the developer to process continuous streams of time series data by grouping them into windows. Time series data can arrive late or out of order, and these frameworks provide abstractions to deal with such cases. In the second phase of the Expo jobs, metrics across all sessions are aggregated for each minute and saved to KairosDB. This gives the metric value across all users for that minute.

Image courtesy : Reza Esfandani, used with permission.

Esfandani says that the team had to write custom code for sessionization since Spark is a general purpose framework. Further, he notes that "the sessionization part that we developed at WalmartLabs can be open sourced with some modification to be used as a third party library for sessionization on top of Spark". Expo uses Spark's watermarking feature to handle late events. Expo's watermark value is 10 minutes, according to Esfandani - which means that events can be upto 10 minutes late and still get processed.

Each experiment in Expo runs with a tenant (mobile app, website) id, with predefined measurement points. It currently supports eight tenants across Walmart.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter