Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News The Guardian Optimizes Mobile Push-Notification Delivery Architecture

The Guardian Optimizes Mobile Push-Notification Delivery Architecture

The technology team at the Guardian has taken on making mobile push notifications faster to improve readers' experience. The original architecture, optimized for concurrency, has been suffering from delays in notification delivery. The engineers used improved observability to make significant gains through experimentation.

Guardian readers can use the mobile app to access content and can register to receive breaking-news alerts via push notifications. The event-driven architecture (EDA) behind that has been operating since 2009 and over time notification delivery times have increased, taking more than five minutes for some users.

Francesca Hammond, a full-stack developer at the Guardian, indicated that the team aimed to deliver notifications to 90% of the intended audience within two minutes, a target coined as "90in2".

The solution supporting push notification delivery utilizes a range of technologies. An internal breaking-news tool that talks to a Scala Play application triggers push-notification delivery. AWS Lambda functions, consuming messages from AWS SQS queues, are responsible for fetching notification registrations from a self-hosted PostgreSQL database as well as sending them to Google and Apple push-notification platforms.


The team improved the observability of the overall process using the ELK stack, which was essential for identifying the bottlenecks.

They identified retrieving notification registrations as the main bottleneck responsible for the delays. Further investigation revealed a large number of database connection errors, leading to high processing times. To address this problem the team introduced an RDS proxy, so that lambda functions would not connect to the database directly, thus avoiding hitting the database's connection limit.

Long query execution times were identified as another source of delays. Upon finding that the query plans were correct and to further improve database performance, a full-vacuum process removed "dead rows" (logically deleted rows the database was still holding) and the database was upgraded from version 10 to 13, which allowed using more powerful AWS Gravitron2 processors.

The team upgraded the database by creating a new RDS instance to minimize the downtime during switchover. They set up logical replication to continuously synchronize the data, while the old instance was used by the application services. At switchover, the team updated services to use the new instance and immediately disabled the logical replication.


Outside of the persistence layer, developers found that lambda functions responsible for submitting notifications to Apple/Google platforms were taking up to six minutes to complete for breaking news with more than 800,000 recipients.

The team conducted several experiments by deploying potential optimizations and observing the results, each time deciding whether to retain the change or revert it. Based on these experiments, they increased the thread-pool size of Scala applications running in lambda functions to improve parallelism. Furthermore, they set the amount of memory and CPU available to lambda functions to the supported maximum, which resulted in lower function-execution times.

Hammond wrote that the team is taking stock before continuing:

We're not done yet! We think larger changes to our architecture might be needed in order to achieve our 90in2 target, specifically when considering larger notifications sent to 2+ million subscribers. Because of the nature of the changes required, we want to try implementing an RFC-style process to gather ideas and feedback before starting development.

The Guardian's core notification platform is open source.

About the Author

Rate this Article