Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Unpacking How Ads Ranking Works @ Pinterest: Aayush Mudgal at QCon San Francisco

Unpacking How Ads Ranking Works @ Pinterest: Aayush Mudgal at QCon San Francisco

At the recent QCon San Francisco conference, Aayush Mudgal discussed how Ads ranking works at Pinterest. He unpacked how Pinterest harnesses the power of Deep Learning Models and big data to tailor relevant advertisements to their users. Mudgal, a senior machine learning engineer at Pinterest, explained how the company ranks ads, the historical reasons for the architecture, and the improvements they achieved over the years. 

During the talk, Mudgal discussed the objectives of Pinterest and how they ensure user engagement. He showed the models they use to filter, find, and select relevant ads. He also showed what features were used to create embeddings representing what a user is interested in and how an advertisement can fit into that. The last part of the talk discusses all possible architectures and technical challenges which go into the stack Pinterest set up for this talk. 

A Two-Sided Marketplace

The first thing to note is that Pinterest operates a dual market, wherein users engage with a library of pins for inspiration while advertisers pay to connect with these users. Ads are seamlessly blended with the content, mirroring the content that the user is viewing.

Advertisers could be seeking awareness or clicks with different objectives and have the choice of setting a maximum bid or opting for an auto bid. The platform's predictions revolve around clicks, saves, hides, relevance, and directing optimized traffic to advertiser sites, ensuring that users encounter pertinent, engaging advertisements. Meanwhile, advertisers want to ensure that their ads are relevant and that users engage with them. That is what keeps advertisers on the platform. 

Deep Learning Models and User Experience

Pinterest harnesses big data to deliver a fast, personalized user experience. The information is processed through a load balancer, an app server, and an ads server in a manner that ensures low latency. Mudgal provided an overview of the architecture diagram, which grew more complicated during his talk. An overview can be seen in the image below: 

Candidate retrieval enables an intricate selection of ads to be presented to the user. These are ranked using heavyweight models to predict the likelihood of a click. Features logged from these actions, paired with user interaction data, compile the training data used to refine machine learning models.

Pinterest employs two procedures for its training workflow:
1. Joiner workflow: Joining events and features, this workflow provides feature statistics and validation. The joiner workflow ensures that all features can be combined into a large overview because the more information there is on the user and what they are watching, the more accurate and relevant an advertisement can be. 
2. Training workflow: This involves training and evaluating a model. It takes the data created by the joiner workflow and the logged information of the interactions to train the underlying machine learning models. 

Users and Advertisements are Embeddings

One of the interesting insights Mudgal provided was the creation of embeddings for both advertisements and users. During training, the goal is to have embeddings for relevant ads to be close to relevant users. By sharing the same embedding space, ads for users can simply be picked by selecting the advertisements closest to the user embedding. This training data, which essentially consists of user engagement data, is critical for the model perspective. What makes this approach extra good is that the embeddings for the advertisements can be pre-computed, such that during selection, they can be queried from a pre-computed database with embeddings. 

As for ad delivery, upon processing through ranking and auction models, ad display is determined based on set cost parameters. Elements like ad allocation, quality floors and reserve pricing play crucial roles in the ads marketplace and auction design.

Pinterest has experimented with many different ways to represent their data. Ultimately, they developed a representation layer for feature creation, a summarization layer for common embeddings learning, a latent cross-layer for explicit feature crossing, and a fully connected layer for the neural network.

Leveraging a multitask deep model enables multiple predictions via a singular model. It further advanced to utilize an attention sequence, improving future interaction predictions by considering the user’s viewing history, all while ensuring low latency performance.

System Monitoring and Model Performance Improvements

To ensure optimal model performance, Pinterest integrates testing, simulated impact through shadow traffic, offline model validation, model failure alerts, and a debugging UI that can replay a request. Continuous online model validation and alerting systems for high prediction rates and model staleness are also in place.

On the serving infrastructure front, Pinterest employs GPU serving, quantization, model distillation, and batch size reduction techniques. These techniques efficiently handle large models whilst maintaining ad ranking performance and user experience. The goal is to make this whole process have a very low latency. One trick to employ here is that the history can utilize the pre-trained embedding space and cache the activations for re-use. 

Last but not least, Mudgal explained that most of the Pinterest models are trained incrementally. This means you can have a different model for every day and deploy them at will. They have even standardized the deployment using MLFlow. That gave them version-controlled models which have tracked training parameters and evaluation metrics. It also makes them reproducible. 

About the Author

Rate this Article