InfoQ Homepage Recovery Content on InfoQ
News
RSS Feed-
From Outages to Order: Netflix’s Approach to Database Resilience with WAL
Netflix uses a Write-Ahead Log (WAL) system to improve data platform resilience, addressing data loss, replication entropy, multi-partition failures, and corruption. WAL decouples producers and consumers, leverages SQS/Kafka with dead-letter queues, and supports delay queues, cross-region replication, and multi-table mutations for high-throughput, consistent, and recoverable database operations.
-
AWS Launches EBS Volume Clones for Instant, Crash-Consistent Data Copies
AWS has unveiled Volume Clones for Amazon EBS, enabling instant, point-in-time copies of storage volumes with a simple API call. This feature provides rapid access with single-digit millisecond latency, ideal for quick test setups and development. While it integrates seamlessly with the EBS CSI driver, understand its limitations, especially around encryption and management.
-
UniSuper’s Entire Infrastructure Deleted by Internal Google Cloud Error
An Australian superannuation fund manager, UniSuper, using Google Cloud for an Infrastructure-as-a-Service (IaaS) contract, found it had no disaster recovery (DR) recourse when the entire infrastructure subscription was deleted.
-
Dealing with Thundering Herd at Braintree
Braintree engineer Anthony Ross explained in a recent article how introducing some random jitter into retry intervals for failed tasks solved a thundering herd issue which was impacting the efficiency of their payment dispute management API.
-
Amazon EC2 Introduces Automatic Recovery of Instances by Default
Amazon recently announced that EC2 instances will now automatically recover in case they become unreachable due to underlying hardware issues. Automatic recovery migrates the instance to a different hardware while retaining instance ID, private IP addresses, Elastic IP address, and metadata.
-
AWS Releases Amazon Route 53 Application Recovery Controller into General Availability
Recently, AWS announced the general availability (GA) of Amazon Route 53 Application Recovery Controller, an additional new set of capabilities in Amazon Route 53. With the capabilities, it will be easier for customers to continuously monitor their applications’ ability to recover from failures and control their recovery across AWS Regions, Availability Zones, and on-premises infrastructure.