InfoQ Homepage Disaster Recovery Content on InfoQ
-
Azure Front Door Outage: How a Single Control-Plane Defect Exposed Architectural Fragility
A recent 9-hour Azure Front Door (AFD) outage was triggered by a faulty control-plane configuration change that bypassed safety checks due to a software defect, leading to a massive blast radius and affecting M365 and Entra ID via Identity Coupling, exposing a critical architectural anti-pattern in centralized edge fabrics.
-
AWS Simplifies Multi-Region Failover with ARC Region Switch
AWS's Amazon Application Recovery Controller Region Switch revolutionizes multi-region failover with a fully-managed, centralized solution. Simplifying disaster recovery, it automates and coordinates essential tasks across AWS services. With proactive validation and a global dashboard, it transforms complex processes into confident, push-button drills, enhancing reliability and cost efficiency.
-
Google Cloud Introduces Non-Disruptive Cloud Storage Bucket Relocation
Google Cloud's innovative Cloud Storage bucket relocation feature enables seamless, non-disruptive data migration across regions while preserving metadata and minimizing application downtime. Maintain governance, enhance lifecycle management, and leverage insights for optimized storage—all without altering access paths. Experience efficient, low-latency solutions tailored for your needs.
-
Figma's $300,000 Daily AWS Bill Highlights Cloud Dependency Risks
Figma's IPO filing reveals a staggering $300,000 daily spend on AWS, totaling $100 million annually, or 12% of its $821 million revenue. The company's deep reliance on AWS exposes it to significant risks, including potential outages and policy changes. This highlights the critical dilemma for tech firms: balancing the benefits of cloud agility with rising costs and vendor lock-in challenges.
-
Microsoft Enhances Azure Elastic SAN with Auto Scale, Snapshot Support, and CRC Protection
Microsoft's Azure Elastic SAN, launched in early 2024, revolutionizes cloud block storage with unique autoscale capabilities, snapshot support, and CRC protection for enhanced data integrity. This fully managed solution simplifies storage management and optimizes costs, making it ideal for businesses seeking efficient, high-availability solutions in the cloud.
-
How Monzo Bank Built a Cost-Effective, Unorthodox Backup System to Ensure Resilient Banking
Monzo Bank recently revealed Stand-in, an independent backup system on GCP that ensures essential banking services remain operational during application and AWS infrastructure outages. Unlike traditional backups, it's a minimal stand-alone system that exclusively supports key operations and features a cost-effective design, resulting in 1% of the operational costs of the primary deployment.
-
Amazon Aurora Introduces Global Database Writer Endpoint for Distributed Applications
Amazon Aurora has recently introduced a Global Database writer endpoint to streamline routing for applications in disaster-recovery scenarios. This highly available global endpoint removes the need for application code changes to reestablish connectivity following a cross-region switchover or failover operation.
-
Microsoft's Customer Managed Planned Failover Type for Azure Storage Available in Public Preview
Microsoft’s new customer-managed planned failover for Azure Storage enhances disaster recovery by enabling geo-redundancy without data loss or reconfiguration. This proactive solution supports business continuity during outages and large-scale disasters, aligning with competitive offerings from AWS and Google Cloud.
-
Microsoft Announces Public Preview of Geo-Replication Feature for Azure Service Bus Premium Tier
Microsoft recently announced the public preview of its new Geo-Replication feature in the Azure Service Bus premium tier. This feature allows continuous replication of a namespace's metadata and data from a primary region to a secondary region, which users can promote at any time.
-
UniSuper’s Entire Infrastructure Deleted by Internal Google Cloud Error
An Australian superannuation fund manager, UniSuper, using Google Cloud for an Infrastructure-as-a-Service (IaaS) contract, found it had no disaster recovery (DR) recourse when the entire infrastructure subscription was deleted.
-
Disaster Recovery Across a Million Pieces: Michelle Brush at QCon San Francisco
During the second day of QCon San Francisco 2023, Michelle Brush, an engineering director, SRE at Google, discussed challenges, patterns, and practices for disaster recovery actions in massively distributed systems in her session. The session is part of the "Designing for Resilience" track.
-
Google Introduces Cloud Backup and Disaster Recovery
Google recently introduced Cloud Backup and Disaster Recovery (DR), allowing customers to enable centralized backup management directly from the Google Cloud console. The new backup and recovery service is designed to work with cloud storage repositories, databases, and applications.
-
How to Prepare for the Unexpected: an InfluxData Outage Story Told at KubeCon EU 22
Cloud applications promise high availability and accessibility to its users, but for that to be achieved a disaster recovery plan is essential. The team behind InfluxDB shared at KubeConEU22 their lessons learned from battle testing their disaster recovery strategy on the day when they deleted the production.
-
Amazon Introduces S3 Batch Replication to Replicate Existing Objects
Amazon recently introduced Batch Replication for S3, an option to replicate existing objects and synchronize buckets. The new feature is designed for use cases such as disaster recovery setup, reduce latency or transfer ownership of existing data.
-
Amazon Announces Elastic File System Replication for Multi-Region Deployments
Amazon recently announced Elastic File System Replication to keep an up-to-date copy of a network file system in a second AWS region or within the same region.