How Pinterest Scaled up Its Ad-Serving Architecture

Nishant Roy wrote on the Pinterest Engineering Blog about their strategy to overcome a scaling problem with their ad corpus. Their existing solution had hit its scaling limit, but further growth was necessary. Its redesign offloaded the ad index to a key-value store and optimised garbage collection in their Go applications to increase their ad corpus size by a factor of 60.

When Pinterest launched a new partnership that enabled a more significant amount of ads on the platform, they realised that the then-current underlying ad-serving architecture could not cope with the increased ad volume.

The index of active ads was published every few hours to the system and kept in memory. Memory constraints meant that the index was split into nine shards. As the ad volume increased, so did the memory requirements, and the system began to run out of memory.

The author reports that the first short-term solutions considered involved increasing the number of shards and applying vertical scaling (i.e., switching to larger virtual machines). Neither was suitable, according to the author. The former was not a good solution because it meant increasing infrastructure and maintenance costs, and it might also mean that more re-sharding could be needed in the future. The latter led them to discover that the service was concurrency-bound; thus, a smaller number of virtual machines could not handle the same traffic as before.

According to the author, the best and more future-proof solution was to stop storing the index in memory and adopt an external key-value datastore instead.

The author states that this change also enabled them to switch their cluster auto-scaling method from time-based to CPU-based, thus linking their infrastructure costs to actual traffic. This switch makes their cluster more resilient to traffic increases because a larger cluster is then automatically provisioned. Another benefit of the migration was a reduced startup time, “from 10 minutes to [less than] 2 minutes”, because the index does not need to be recreated at every startup.

Since the index is now in an external datastore, the author points out that an additional remote procedure call had to be introduced to fetch index data; parallelising independent steps and introducing a short-lived local cache compensated for the added latency.

According to the author, switching to an external key-value store also enabled them to take advantage of real-time index updates.

The author further noted that their system was also suffering from large latency spikes, which were being caused by excessive garbage collection in their Go applications. Optimising their memory usage with techniques such as favouring short-lived objects, using object pools, and removing unused fields brought them a significant reduction in tail latency.

The balance between parallelisation and local caching has been discussed in some data centre hardware and software architecture research.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the Scaling topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter