As part of the Super Bowl marketing campaign, Duolingo sent out 4 million mobile push notifications when the company’s five-second ad aired during the commercial break. At QCon London, Doulingo's engineers presented the asynchronous AWS architecture responsible for broadcasting messages to millions of users across seven US cities.
Duolingo is an educational platform for learning foreign languages that leverages the AWS cloud platform, including its many services. The company built a microservices-based technology platform to serve nearly 27 million monthly active users. Last year, the company decided to run a marketing campaign during the Super Bowl that required sending out millions of push notifications precisely when a short ad was aired during the commercial break.
During the presentation, Vitor Pellegrino, SRE at Duolingo, explained that the company couldn’t know in advance the exact time the advert would be aired. Hence, the solution required human action to kick off the push notification delivery. Furthermore, the company wanted to deliver all or almost all mobile notifications in a short period (ideally during the ad itself) and ensure that each user would receive just a single notification, even if the entire process was triggered independently by more than one person.
Zhen Zhou, senior software engineer at Duolingo, talked through the architecture the team has built to support the timely delivery of push notifications and validation activities required to ensure the requirements would be met on the big day. Duolingo created a dedicated asynchronous architecture comprising the API Gateway, Python components running in AWS ECS and SQS queues. The solution sourced user and device data from DynamoDB and S3 bucket, and used CloudWatch for observability.
The Architecture for Sending Out Push Notifications at Speed/Scale (Source: QCon London)
The main challenge the team had to address was to ensure that 4 million push notifications could be swiftly published to Google (FCM) and Apple (APN) platforms when the process was triggered while preventing duplicates. Engineers used a FIFO SQS queue as an entry point to dedupe messages. FIFO queues support a 5-minute deduplication window but only a 300 messages/second delivery rate, so if a higher delivery rate were required, a custom solution based on a cache or the database would be necessary. The solution uses the second, regular SQS queue to trigger publishing push notifications. Since SQS queues have an in-flight message limit of 120,000 messages per second, engineers used data batching to support the required publication rate.
On the day, in preparation for sending push notifications, an engineer first had to upload the campaign data (user-to-device mappings) for 4 million users to an S3 bucket. Then, a couple of hours before the Super Bowl commercial break, 5000 notification worker application instances were manually provisioned by modifying the autoscaling group (ASG) and ECS task. Additionally, engineers provisioned 20 interim workers responsible for prefetching data from S3 and storing it in memory so the data could be quickly sent out in batches when the process starts.
Finally, during the commercial break, marketing managers who were monitoring the live stream across multiple channels kicked off the delivery of push notifications, and the architecture published 95% of notifications in 3.9 seconds and 99% of notifications in 5.7 seconds.
Pellegrino and Zhou shared how the team prepared for the Super Bowl campaign, including running three rounds of push notification delivery, starting with 1 million users to validate the end-to-end process at scale. Before these real-user tests, engineers spent time to ensure they could scale out the push-notification delivery process, as well as the rest of Duolingo's platform, and address any performance bottlenecks.
After the talk, the audience asked several questions, including:
QCon Attendee: There is one part of the system where you don't have control over, which is how fast Google and Apple will actually send out notifications. Did you talk to them before?
Zhen Zhou: That is a very fair question. I mean, that definitely came up to us. Well, first of all we don't have control over their system. The second of all, we actually have to do some investigation to understand whether they have rate limits against doing things this way.
I can answer the second question for sure. We reached out to them and asked if they have specific rate limits for sending notifications, which they don't. And the first one is kind of a hard problem because honestly we don't really have control over their API, we only have control over our internal systems. We built with the goal in mind that we only want to measure the performance of our internal systems. It's definitely a harder problem to approach when you are working with a vendor.
QCon Attendee: How much of an extra increase was it on your cloud bill?
Vitor Pellegrino: It was a lot for a brief amount of time :)
Access recorded QCon London talks with a Video-Only Pass.