Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Predicting Time to Cook, Arrive, and Deliver at Uber Eats

Predicting Time to Cook, Arrive, and Deliver at Uber Eats

Lire ce contenu en français

Key Takeaways

  • With the mission "Make eating well effortless, every day, for everyone", one of our top priorities in Uber Eats is ensuring reliability. We need delivery partners to arrive at restaurants the moment food is ready, in which time prediction is constantly playing a critical role across the order lifecycle.
  • Uber Eats's dispatch system launched with a greedy matching algorithm, and by using a global matching algorithm powered by time predictions was able to be more efficient.
  • Uber's in-house machine learning platform, Michelangelo, has provided tremendous help in simplifying the overall process for data scientists and engineers to solve machine learning problems.
  • Our biggest challenge of lacking ground truth in O2O business model was tackled by inferring the label data via feature engineering work and leveraging the feedback loop for model retraining.
  • The estimated time-of-delivery prediction model is designed to be flexible enough to handle various scenarios due to its uniqueness of newly surfaced information in different stages.

Uber Eats has been one of the fastest-growing food delivery services since the initial launch in Toronto in December 2015. Currently, it’s available in over 600 cities worldwide, serving more than 220,000 restaurant partners and has reached 8 billion gross bookings in 2018.

The ability to accurately predict delivery times is paramount to customer satisfaction and retention. Additionally, time predictions are important on the supply side as we calculate the time to dispatch delivery partners. My recent talk covered how Uber Eats has leveraged machine learning to address these challenges.

Machine learning in Uber Eats

With the mission "Make eating well effortless, every day, for everyone" one of our top priorities is ensuring reliability. First, we need to set the right expectations by providing precise delivery time estimations in order to avoid frustrations in the case of delays. Next, we have to compute the perfect timing to send out delivery partners to pick up food. Ideally, they should arrive at the restaurant the moment food is ready. By arriving too early, they would take up the restaurants’ parking and dine-in spaces. On the other hand, arriving late will lead to unfavorable cooling of food. Therefore, time prediction is constantly playing a critical role across the order lifecycle, which is shown in the figure below.

Time predictions through an order lifecycle

There’s no other way to ensure accuracy without utilizing machine learning technologies. However, challenges arise along the way with its core development. Compared with other machine learning problems, our biggest challenge is lacking ground truth data, which is pretty common in the online-to-offline (O2O) business model. However, it’s the most critical component in machine learning, as we all know "garbage in, garbage out." Another one is the uniqueness of Uber Eats as a three-sided (delivery partners, restaurants, and eaters) marketplace, which makes it necessary to take all partners into account for every decision we make.

Fortunately, Uber’s in-house machine learning platform - Michelangelo has provided tremendous help in simplifying the overall process for data scientists and engineers to solve machine learning problems. It provides generic solutions for data collecting, feature engineering, modeling, serving both offline and online predictions, etc., which saves a lot of time compared to reinventing the wheels.

Michelangelo - Online and offline predictions

The above figure provides an overview of both online and offline prediction pipelines. The online part is primarily for making predictions with real-time and near real-time data that is collected via Kafka and pre-processed by streaming engines like Samza, Flink, etc. Eventually, they would be persisted in Cassandra feature stores. For the offline part, data collected from different sources would be pre-processed by SparkSQL and persisted in HIVE feature stores. Then we’ll train models based on the algorithms provided by the platform, which supports everything in Spark MLlib, some deep learning models, etc.

Feature report - each feature’s distribution and importance to the model

This is a feature report for a trained model. It provides lots of detailed information such as data distribution and feature importance, which is super helpful for feature engineering work.

How time predictions power the dispatch system

Now let’s find out how our dispatch system works and has evolved through time predictions. From the high level, it’s obvious that the dispatch system is the brain of the Eats business. Its goal is to make the most optimal demand-and-supply matching decisions. In our context, demand is the eater’s order and supply is the delivery partner. Like we mentioned before, timing is key since we need to ensure the delivery partner will arrive at the restaurant the moment food is ready for pick up.

There’s no doubt that we had multiple iterations before reaching the current stage. In order to help people understand the degree of impact, a comparison between before and after introducing time predictions for deciding the dispatch timing is provided below.

We started with fixed guessed values to decide when to send out the delivery partner to pick up food. For example, if an eater places an order at 5:30 pm, we assume the food takes 25 mins to prepare and the delivery partner needs 5 mins to get to the restaurant, so we would start to dispatch at 5:50 pm. Apparently, it’s not scientific to use 25 mins for all orders from that restaurant regardless whether eaters order 1 dish or 10 dishes - the same goes for  the 5 min setting for all delivery partners’ travel time. As a result, it caused lots of confusion among all the partners, leading to problems such as the inability for eaters to track their food accurately, delivery partners extended waiting period at restaurants or restaurants having no idea where the delivery partner was while the food got cold.

Once we introduced the time predictions, we replaced both main factors from assumptions to machine learning-based predictions. The first one was the food preparation time prediction. Instead of using 25 mins for all orders from the same restaurant, we used a trained machine learning model to make a prediction for each order, which is 30 mins in the following example.

Dispatch system - dispatch timing

In the meantime, instead of using 5 mins for all delivery partners’ travel time, we also used a machine learning model to estimate the travel time of eligible delivery partners, which is why we start to dispatch 10 mins before the food ready time in the example. The satisfaction from all partners improved significantly. Eaters were able to get fast deliveries with clear expectations. Both restaurants and delivery partners became more efficient so they could complete more orders in a period of time.

On top of that, our dispatch platform also switched from a greedy to a global matching algorithm.

The greedy matching algorithm only starts looking for a delivery partner when there’s an order coming in. The result is optimal for a single order but not for all the orders in our system from a global perspective. Therefore, we changed to the global matching algorithm so that we can solve the entire set of orders and delivery partners as a single global optimization problem. What does that mean? Let’s do a quick comparison as shown in the following figure. The figure on the left shows the greedy matching algorithm. When the first order comes in, our dispatch system matches it with the most eligible delivery partner who would take 1 min to arrive at the restaurant. Then the second order comes in, the system matches it with the delivery partner who would take 5 mins. So their total travel time is 6 mins.

The figure on the right shows the global matching algorithm in which the new dispatch system considers both orders and delivery partners at the same time and matches them in a global optimal way. As a result, both of the delivery partners would need 2 mins and their total travel time is only 4 mins.

Dispatch system - comparison between greedy and global algorithms

It’s obvious that the global matching algorithm is much more efficient. However, it couldn’t live without accurate time predictions since time is more sensitive and critical in order to make matching decisions. That’s also one of the main reasons that we could only switch to the global matching algorithm after introducing time predictions.

Deep dive in time predictions

Now let’s dig into the details about time predictions. As we’ve already seen from the previous section, food preparation time prediction is super critical to the business since it’s the main factor determining when to dispatch the delivery partner. As mentioned earlier, one of the biggest challenges while applying machine learning in the O2O business model is the ground truth collection, and food preparation time prediction is a typical example here. Due to the fact that we’re not physically in the restaurant and restaurants don’t have any incentive to provide related information, it’s almost impossible to know when the food will be ready. What we could do is use other signals from delivery partners and restaurants to infer the ground truth. However, it’s not always accurate due to unpredictable circumstances like parking availability, walking to find the restaurant’s entrance, etc. Therefore, we focused on two areas since the very beginning. One is how to better utilize the available data from feature engineering, and the other is how to increase the model accuracy by inferring the label data and leveraging the feedback loop for model retraining.

For the feature engineering work, it’s divided into three sets: historical features like the average food preparation time for the past week; near real-time features like the average food preparation time for the last 10 minutes; real-time features like time of day, day of the week, order size, etc. The reason we needed real-time and near real-time features was that we needed extra information to handle some situations that could change rapidly. As an example, the distribution between orders and delivery partners could be affected by bad weather.

One example of our feature engineering work is leveraging representation learning for cuisine type features. In real-time features, we’ve already considered orders’ specific data like price, number of items, etc. However, we don’t just want to know how many items are in it, we want to find out what items are in it since food preparation time among different cuisine types could vary significantly, such as preparing a beef stew versus a salad. In order to improve the model accuracy by taking the cuisine type into account, we use our menu data to generate word embeddings, classify them into different categories, and make them part of the real-time features.

Representation learning - menu embedding

Meanwhile, we’re always exploring more ways to collect new signals for improving the accuracy of inferred label data. Sensor signal from delivery partners’ phones is one of them. It’s mainly for detecting the delivery partner’s status in some complicated circumstances. For example, if food pickup gets delayed when their locations are already close to the restaurant, we need to know whether it’s the difficulty to locate the restaurant or the inability to find parking that causes it. Here we rely on the sensor data to predict their next move based on their current states.

Sensor signals

The method we’ve used is via conditional random field modeling with the target to classify what the current state is from a set of sequences. By leveraging this possibility model and labeling some trips’ sequential data, we are able to predict a delivery partner’s next activity, like from parking to picking up food.

Conditional random field model

On the other hand, the feedback loop that we have introduced for model retraining has also significantly improved our inferred label data. Before digging into more details, I’d like to first explain the food preparation time prediction model.

We use gradient boosting decision trees trained with XGBoost and leverage the hyperparameter tuning provided by the Michelangelo platform to improve the model accuracy and training performance. The approach of hyperparameter tuning in our food preparation time model is the bayes optimization. Basically, the idea is to enable us to make smart choices among all the possible combinations of parameters needed by XGBoost.

Hyperparameter tuning

So how does it work? Looking at the plot on the left, the posterior curve shows the 3 parameter combinations we have tried so far. The larger blue area in the middle shows the part we don’t have much confidence in since we haven’t tried any combination yet. In order to figure out what combination to try next, we created the acquisition function, which tells us what we should try that would provide a higher possibility of yielding better outcomes. The right plot shows the results after experimenting with the acquisition functions recommended combination.

Now let’s get back to the feedback loop we mentioned multiple times. It is another key component in terms of improving the prediction accuracy due to lack of ground truth. Since we have to infer the true food preparation time, an approximation error always exists between inferred and actual data. How to reduce this error was our top priority. We introduced a feedback loop into our model to fit the approximation error, which helped to correct the inferred prep time, especially when there were new signals available. For instance, we check each completed order to see whether we have collected more information to imply the actual food preparation time. If yes, we use that information to update the inferred value and use it for future model retrainings.

Feedback loop

Another important feature of time predictions is estimated time-of-delivery (ETD), which constantly affects eaters’ experience during the entire order lifecycle. For example, whenever eaters are wondering how much longer they need to wait for their food, they can check our ETD predictions first. Although ETD and food preparation time predictions share a lot of similar techniques in terms of data processing, feature engineering, etc., there are also many differences between the two.

The uniqueness of ETD prediction is the variation across different stages due to more and more information surfaced along the way. For example, when an eater is browsing the restaurants, we only have features like restaurant’s location, cuisine type, etc. to make predictions. When an eater places the order, we will then have more detailed features from the order itself, such as number of items, pricing, etc. When we’ve matched a delivery partner to pick up the order, we’ll have even more information from the delivery partner, such as their estimated travel time. Therefore, our ETD prediction model needs to be flexible enough to handle all the variations here.

Eater-facing ETD

The last feature in time predictions worth mentioning is the travel time estimation. Uber Eats couldn’t grow exponentially in such a short period without all the support from the rides business. We’re not only sharing the drivers, but also building the tech stack on a shared platform. Travel time estimation is one of them. It’s super critical for rides since every second matters for each trip. Riding customers are able to track their trips from the point of requesting a trip, getting picked up, to arriving at the destination. For Uber Eats, the biggest difference is that we have non-car delivery partners, like bikers, walkers, etc. In large cities like New York and San Francisco, bikers and walkers are much more efficient at avoiding congestion, parking problems, and more. Making accurate travel time predictions for these types of delivery partners are very important to the Eats business. In order to do that, we worked with Uber’s Maps team to collect non-car data and trained a new model separately. The following figure shows that a biker has been matched to pick up food and his travel time estimation is 11 mins.

Non-car travel time estimation


Now you understand the critical role that time prediction plays in Uber Eats and how we leveraged machine learning technologies to tackle some of the hardest problems in the O2O business model. We’ll certainly continue to invest in this area going forward and build a top-class machine learning solution for serving all the partners in our three-sided marketplace. For more information related to machine learning at Uber, please visit our engineering blogs. If you have more specific questions related to time predictions, feel free to reach out to me via LinkedIn.

About the Author

Zi Wang is an Engineering Manager at Uber, who is leading the machine learning engineering work for time predictions including estimated time of delivery, food preparation time, and travel time estimation in Uber Eats. In addition, he worked on Uber Eats dispatch system, Uber Rush, and an in-house payment system. Before Uber, Zi worked at Microsoft and built the real-time collaborative editing feature for major Office apps. He is passionate about building highly scalable and performant backend services and leveraging big data and machine learning to solve real-world problems in order to ensure system efficiency and user experience.

Rate this Article