Moumita Bhattacharya spoke at QCon SF 2024 about state-of-the-art search and ranking systems. She gave an overview of the typical structure of these systems and followed with a deep dive into how Netflix created a single combined model to handle both tasks.
Dr. Bhattacharya, a senior research scientist at Netflix, began by outlining the problem and common solution patterns. Many consumer websites these days have millions of users and millions of items or products, and to provide their users with a good experience they must help users find the products they are interested in, whether via direct search by the user or by recommending products automatically. The two problems of search and recommendation both can be solved via ranking, but the scale of the problems means it's not possible to dynamically rank all the products for each user in real time.
The typical solution is a two-step system, where the first step selects a few hundred or thousand candidate products from the full catalog of millions, then the second step ranks these candidates. The first pass needs to have high recall, meaning it should return a high fraction of all relevant products. The second step is where often "heavy machine learning algorithms are deployed."
Bhattacharya then explored some specific use cases at Netflix. First was a recommendation use case: anticipatory search. Because many Netflix users are interacting with the site via a television remote, it is difficult for them to enter search queries, so Netflix tries to predict a query the user is likely to enter and recommend movies based on that. While Netflix does include a user's long-term history data in its models, it also uses "in-session" browsing signals to determine the user's current intent.
The second use case presented Unified Contextual Recommender (UniCoRn), a single model that can handle both search and recommendation tasks. It supports both text-based search queries, for example when a user types a movie title, as well "more like this" recommendations which, given one movie, suggest additional movies to watch. The key differences in these tasks are the context (search query vs source movie), engagement data, the candidate set, and any post-ranking business logic. Bhattacharya pointed out that having the single model trained for multiple tasks actually increased its performance. Netflix found that UniCoRn produced 7% and 10% lift, respectively, for search and recommendation.
After her talk, Bhattacharya took questions from the audience. One question was that, although a combined model might have better performance, did it have any drawbacks? Bhattacharya said that Netflix did need to work to ensure "personalization is not overpowering relevance," and how they merged these two was important. Another drawback was that the model was quite large, and so inference latency was a challenge.
Another audience member asked how the scheme would scale for domains where the number of products was much higher, such as e-commerce. Bhattacharya said she believed it would, since the model would be operating in the second step of the pipeline, after the first step produced a candidate set. However, she noted that it would require a good first-pass retrieval algorithm.