BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Instacart Creates Real-Time Item Availability Architecture with ML and Event Processing

Instacart Creates Real-Time Item Availability Architecture with ML and Event Processing

Instacart combined machine learning with event-based processing to create an architecture that provides customers with an indication of item availability in near real-time. The new solution helped to improve user satisfaction and retention by reducing order cancellations due to out-of-stock items. The team also created a multi-model experimentation framework to help enhance model quality.

Instacart has grappled with the challenge of accurately predicting the availability of grocery items for years. Still, the COVID pandemic and supply chain issues made this functionality even more critical to the company's success, considering millions of items across 80k+ retail locations. Viswa Mani Kiran Peddinti, director of engineering at Instacart, highlighted the importance of improving the order quality, including up-to-date item availability, to ensure customer satisfaction:

Based on intuition and user research, we learned that order quality, specifically found rate, is critical for customer retention. However, we needed to find a way to validate that hypothesis and benchmark for a 'good found rate'. The benchmark is important for us to find the optimal zone in the selection-found rate curve for the system to operate in that maximizes customer confidence measured through retention. We conducted multiple experiments and developed analytical models to measure the impact of changing found rate on customer retention.

To provide users with up-to-date and accurate item availability data, the company created a new architecture that brings availability scores for shopping items to customer applications in near real-time. The architecture delivers a periodic full sync process that uses the Snowflake data warehouse and DB ingestion workers to move availability scores, computed by ML models, to SQL transactional databases. For frequently searched items, however, an on-demand refresh process using AWS Kinesis and S3 ensures item availability scores are ingested into the DB even more often.

Lazy Score and Full Sync Refresh Architecture (Source: Instacart Technology Blog)

Instacart revamped its ML availability scoring model pipeline to improve interpretability, handle the growing product catalog with many infrequently shopped products, and adapt to changing customer habits. Additionally, the company wanted to upgrade its ML infrastructure and move to its new MLOps platform.

The new model combines three components for general, trending, and real-time availability scores. The general score describes typical item availability patterns and is recalculated weekly. The trending score quantifies short-term deviations from the typical patterns and is recomputed daily or hourly. Lastly, the real-time score is based on the latest observations, using an event-driven streaming architecture that employs Apache Kafka and Flink to deliver signals sourced from customer applications and retailer systems.

The new model pipeline accommodates varied item purchase frequencies and substantially reduces computation costs by about 80% compared to the previous approach. Additionally, the new solution takes full advantage of new capabilities offered by Griffin, the new MLOps platform. It supports model versioning and use-case-based model variations to obtain the most suitable item availability score.

Improved Model Pipeline (Source: Instacart Technology Blog)

The company created a new multi-model experimentation framework to help implement ML model improvements, while reducing the engineering effort for running model experiments in parallel. Another framework, named Delta Framework, provides a mechanism to adjust availability scores for specific product, retailer, or geographical segments on an ad-hoc basis, which helps with ML model experimentation and reacting to unexpected events impacting item availability. The Delta Framework additionally supports a multi-segment optimization functionality that continuously monitors availability scores presented to customers, the found rate, and order retention and can recommend segment-level score adjustments.

About the Author

Rate this Article

Adoption
Style

BT