AWS Introduces Durable Storage Option for ElastiCache for Valkey

AWS has recently introduced durability for Amazon ElastiCache for Valkey, enabling reliable data retention across failures and expanding support beyond caching to persistent workloads. The feature offers new options that prioritize either minimizing data loss or maintaining lower write latency, expanding the range of use cases supported by the Redis fork, including AI memory, session storage, and real-time applications.

ElastiCache for Valkey now supports both caching and persistent data workloads. Developers can choose synchronous durability to minimize data loss during failures, or asynchronous durability for lower latency. Traditional ElastiCache without durability remains the default and cheapest option for caching scenarios where data can be rebuilt from source. Jules Lasarte, software engineer at AWS, and Karthik Konaparthi, principal product manager at AWS, explain:

Many organizations find that Multi-AZ replication and automatic failover in ElastiCache meet their resilience requirements, but as customers increasingly adopt ElastiCache as a persistent data store, as well as a cache, data loss becomes a primary concern.

ElastiCache can now be used for persistent data workloads in addition to caching, including AI agent memory, workflow state, RAG knowledge bases, payment tokenization, and inventory management.

Source: AWS blog

While both durability modes maintain microsecond-level read latency, synchronous writes are acknowledged only after data has been replicated across at least two AZs, reducing the risk of data loss at the cost of higher write latency. Asynchronous writes are acknowledged before replication completes, preserving lower write latency but with a risk of losing up to 10 seconds of recent data. Lasarte and Konaparthi add:

To bound potential data loss with asynchronous writes, ElastiCache enforces a durability buffer of up to 10 seconds. The primary node continuously tracks the age of the oldest write that has been accepted but not yet persisted to the Multi-AZ transactional log, and publishes this value to Amazon CloudWatch as the DurabilityLag metric. (...) If the buffer grows beyond 10 seconds, for example because of transient network congestion to the transactional log, the primary temporarily rejects incoming write commands until it catches up.

Read latency remains unchanged, while writes may be temporarily rejected if replication lag grows too large. Normal operation resumes automatically once the cluster catches up, with the team recommending clients such as Valkey GLIDE to enable automatic retry and exponential backoff. Corey Quinn, chief cloud economist at The Duckbill Group, warns in his newsletter:

Once again I am begging you to not confuse "cache" with "primary data store." Once again, you will ignore me, as some lessons can only be learned and internalized via SLA breaches.

While ElastiCache supports Valkey, Memcached, and Redis, the new feature is available only for Valkey. On Reddit, developers like the new option but question whether it supersedes Amazon MemoryDB, the Redis-compatible in-memory database designed for applications requiring low-latency and durable data storage.

Durability for ElastiCache is available in all regions starting with Valkey 9.0.

About the Author

Renato Losio

Show moreShow less

InfoQ Software Architects' Newsletter

Follow us on

About the Author

Renato Losio

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter