Amazon Aurora PostgreSQL-Compatible databases recently introduced a logical replication write-through cache to reduce the amount of storage I/O during logical decoding. The new feature decreases the replication lag, improving read I/O and transaction catch-up time.
According to the cloud provider, the logical replication write-through cache provides significant improvements without a drop in Transactions Per Second (TPS). Testing with a pgbench workload, Susan Douglas, developer advocate at AWS, and Scott Mead, principal database engineer at AWS, write:
If a replication stream is generated too rapidly for an instance to keep up, the replica falls behind; this is replication lag. The default write-through cache realizes a 44% improvement in replication lag over the last release of Aurora PostgreSQL. Increasing the write-through cache size to 2 GB improves the replication lag even more, to 59%.
Providing fine-grained control of replication for Aurora PostgreSQL clusters, the logical replication in PostgreSQL is a publication/subscriber capability that works by decoding the write-ahead log (WAL) into a stream of records that can be consumed by the subscriber. Designed to decrease latency and reduce the CPU overhead of PostgreSQL logical replication, the new Aurora feature reduces the latency of logically replicated data to clients.
Source: https://aws.amazon.com/blogs/database/achieve-up-to-17x-lower-replication-lag-with-the-new-write-through-cache-for-aurora-postgresql/
Among other options, logical replication can be used to perform an export of data to managed services on AWS like Database Migration Service and Kinesis, or external databases in cloud migration projects. Douglas and Mead explain how it works:
As write transactions are committed in an Aurora PostgreSQL cluster, the corresponding WAL records are written to both Aurora storage and the Aurora PostgreSQL WAL cache. If the cache is full (the size is defined by the rds.logical_wal_cache parameter), the oldest record in the cache is removed and the new record is appended to the end of the cache (typically referred to as a First-in, First-out, or FIFO queue).
Aurora offers three functions to evaluate and manage the write-through cache for PostgreSQL databases: aurora_stat_logical_wal_cache() returns information about cache usage per slot, aurora_stat_reset_wal_cache() resets the counter for the metrics on the write instance, and get_oldest_wal_cache_ptr() returns the oldest page in the logical WAL cache.
The write-through cache is enabled by default for clusters using logical replication in Aurora PostgreSQL version 11.17, 12.12, 13.8, and 14.5.
AWS will drop support for Aurora PostgreSQL 11.x versions in January 2024. The choice to retire the major version without offering a native migration process using logical replication and avoid downtime raised concerns in the community.