Mathieu Ripert on Instacart's Machine Learning Optimizations
Instacart is an online delivery service for groceries under one hour. Customers order the items on the website or using the mobile app, and a group of Instacart’s shoppers go to local stores, purchase the items and deliver them to the customer.
InfoQ interviewed Mathieu Ripert, data scientist at Instacart, to find out how machine learning is leveraged to guarantee a better customer experience.
InfoQ: What architecture are you using to process event data and what is the data volume/ throughput?
Ripert: First, we have a distributed architecture. Our system is split on business domain (Catalog, Logistics, Shoppers, Partners, Customers, etc.), as this allows each domain to be developed and deployed independently, and ensures a clear primary owner for the code and models within. Each domain has its own data stores, and we use RabbitMQ to facilitate communication between domains.
We use PostgreSQL for our production database and Amazon Redshift for offline analytics. Data is partitioned by day or week depending on the quantity of data. For example, for our fulfillment engine, we track GPS locations of all our shoppers in real time, which represents around 1GB / day of data, and this data is partitioned by day.
InfoQ: Instacart delivers groceries within an hour. How do you test sensitivity changes in your models and ensure they can reliably generate predictions once deployed?
Ripert: First, we backtest our models extensively using historical data. This helps us to ensure we have models that perform well on the data they were trained on, but also on data reserved over the same time period and from time periods in the future. Our models are retrained daily to account for new and trend changes.
Then, in production we track the performance of the models using multiple statistics (e.g., both the RMSE and also the MAE for a regression model). We have an internal query tool that can easily compute statistics like this on production data and throw alerts whenever statistics deviate from the norm. Finally, we have operations teams who watch markets, and make real-time adjustments to predictions using our interfaces if they see sudden changes in performance due to inclement weather or other special events. We are working towards automating more of those adjustments as well.
InfoQ: XGBoost library for Gradient Boosting, is applied in supervised learning to create tree ensembles. How are you using this software library to optimize the overall delivery time?
Ripert: In a given market, we might have thousands of deliveries to do in the next few hours, and hundreds of personal shoppers on shift and able to fulfill orders. Our fulfillment system needs to decide in real time how to combine orders into batches of deliveries that in-store shoppers can shop for in store, and delivery drivers can pick up and deliver to customers' addresses. Our objectives for this system are to: (a) ensure that the customer gets the groceries they ordered, (b) ensure that the delivery is made within the one-hour window the customer selected at checkout, (c) ensure that the shoppers can be as efficient as possible, allowing us to take even more orders from customers.
Predictions from GBM models are used to estimate how long any given shopper will take to either shop for an order in a store, or to deliver an order to a customer's address. In part, those predictions are crucial for ensuring that the decisions we make for fulfilling orders result in deliveries being made on time. But perhaps even more importantly, these predictions let our optimization algorithms know which combinations of shoppers and deliveries will allow us to move as fast as possible as a system – ultimately increasing our speed.
InfoQ: XGBoost can be used with distributed processing frameworks like Apache Spark and Apache Flink. What does your setup look like and how do you ensure your models are computed in a timely fashion?
Ripert: For fulfillment we have been able to build models on single instances, using the multi-core processing supported in XGBoost natively. In part, this is possible because we can build separate models for separate regions, since combining far-flung regions won't improve generalization performance significantly as the most important features are local time and space effects.
Given how much AWS has been pushing instance memory and CPU capacity (up to 2TB, 128 CPU) we have been able to avoid using distributed processing frameworks for most problems. Where we have been unable to fit data into memory has been for product recommendations. We use Apache Spark and MLLib to fit collaborative filtering models for product recommendations.
InfoQ: What data sources are considered for optimization of delivery time, and how far back is historical data analyzed?
Ripert: For delivery time, we use the origin and destination latitude and longitude, the timestamp when we are planning to begin the delivery, customer address related information (residential vs business address), and the mode of delivery (car, bike, walker).
For our optimization engine, we have to make 100,000 predictions per minute in our large geographies because we are considering many combinations of deliveries and shoppers. That limits how much additional information we can include in these predictions – both because a lot of information isn't known yet, and also because the feature generation and prediction must be highly scalable.
Once we have made delivery assignments, we can update predictions in the future using information about the specific shopper, their location, recent movement in space, and additional data like weather. We are continuing to explore and test these extension ideas.
InfoQ: How are NLP (Natural Language Processing) techniques used to improve the overall experience?
Ripert: NLP is most relevant for Instacart on the discovery side of our business. Just by analyzing past purchase behavior, we can make strong recommendations for commonly purchased items to our users. However, we have many millions of products across hundreds of retail partners in our catalogue, with a long-tailed distribution for order frequency. We use NLP to generalize our recommendations to work for items that are purchased infrequently. For example, if we can learn that a user has an affinity for gluten-free products and Italian cuisine, we can extend recommendations to all products that have terms that are similar to those.
InfoQ: Is deep learning on your roadmap to solve challenges you face?
Ripert: Absolutely. We are already using embedding models in our Search and Discovery team to solve natural language processing problems. In our catalogue team, we are also actively experimenting with convolutional neural networks for image processing. Finally, we have been prototyping a deep learning model to predict the sequence that our best shoppers shop for items in at a given store location, and are planning to test re-ordering shopping lists for our shoppers using that model early in the new year.
Fundamentally, deep learning is opening up new problems that previously would have been intractable with standard machine learning tools, and required a tremendous amount of feature engineering. Given the volume of data we are collecting, we see huge opportunities to use deep learning to improve the experiences of both our customers and our shoppers.
InfoQ: What challenges did you run into or what lessons did you learn from using big data & machine learning solutions in your product?
Ripert: We faced many different types of challenges:
- Data integrity - The most important thing in machine learning is data. So we need to make sure the data we collect is clean and without noise. For example, when a shopper finishes a delivery, he has to press a button in the shopper app to end the trip. Then we use these timestamps to train our delivery time prediction model. However, for various reasons, some shoppers may not click “delivered” exactly when they actually do it, so we built some models to filter and infer the right timestamps from GPS data.
- Anomaly detection - When a severe storm occurs it can bias predictions in the near future if time is included as a feature.
- Variance modeling - Our fulfillment engine needs to decide how to combine orders together in order to minimize the total travel time, while making sure orders are delivered on time. To make sure we account for the variance of our predictions, we use a quantile regression to estimate an upper bound of the delivery time, and we plan based on this upper limit.
- Latency - As we have to call hundreds of thousands of predictions per minute, the “speed vs. accuracy” tradeoff becomes a real challenge.
- Performance evaluation - How to choose the best model evaluation metrics based on our business goals is another challenge. For example, MEA might not be the best measures for our prediction models. Indeed, a one minute error is not the same depending if the order is late or not.