InfoQ Homepage Chaos Engineering Content on InfoQ

News

RSS Feed

Newer Older

DevOps

Coinbase Postmortem Reveals How a Localized AWS Failure Triggered a Multi-Hour Trading Outage

Coinbase has published a detailed postmortem of its May 7, 2026, outage, revealing how a localized cooling failure inside an AWS data center escalated into a multi-hour disruption that halted nearly all trading activity across the cryptocurrency exchange

Craig Risi
on Jun 16, 2026
DevOps

Enhancing Reliability Using Service-Level Prioritized Load Shedding: Netflix at QCon SF 2025

At QCon San Francisco, Netflix engineers unveiled their advanced Service-Level-Prioritized Load-Shedding strategy, enhancing reliability during traffic spikes. By prioritizing high-value requests and automating management across microservices, they safeguard user experience and system stability. Key insights stress prioritization, automation, and structured load shedding for optimal resilience.

Steef-Jan Wiggers
on Nov 20, 2025
DevOps

Google Cloud Introduces Chaos Engineering Framework and Recipes for Distributed Systems

Google Cloud's Expert Services Team has released a detailed guide on chaos engineering for cloud-based distributed systems. It highlights that the intentional creation of failures is essential for developing resilient architectures. The initiative provides open-source recipes and helpful guidance for applying controlled disruption testing in Google Cloud environments.

Claudio Masolo
on Nov 12, 2025
DevOps

How Google Does Chaos Testing to Improve Spanner's Reliability

To ensure their Spanner database keeps working reliably, Google engineers use chaos testing to inject faults into production-like instances and stress the system's ability to behave in a correct way in the face of unexpected failures.

Sergio De Simone
on May 21, 2024
Cloud

Chaos Engineering Service Azure Chaos Studio Now Generally Available

Two years after entering public preview, reliability experimentation service Azure Chaos Studio is now generally available. Among its most recent features are experiment templates, dynamic targets, load testing faults, and more.

Sergio De Simone
on Nov 30, 2023
Architecture & Design

Filibuster: Automated Fault Injection Tool to Improve DoorDash's Reliability

DoorDash recently revealed how they are using Filibuster, an automated fault injection tool, to identify resilience issues in microservice applications early on and improve platform reliability.

Tanmay Deshpande
on Sep 26, 2022
Cloud

Microsoft Announces Azure Chaos Studio in Public Preview

At the recent Ignite, Microsoft announced the public preview of Azure Chaos Studio, a fully-managed experimentation service to help customers track, measure, and mitigate faults with controlled chaos engineering to improve the resilience of their cloud applications.

Steef-Jan Wiggers
on Nov 10, 2021
DevOps

Litmus 2.0 Release Includes Multi-Tenancy, Chaos Workflows, GitOps, and Observability

Last month, Litmus 2.0 was released for general availability, with the goal of simplifying chaos engineering by adding new features like chaos center, chaos workflows, GitOps for chaos, multi-tenancy, observability, and private chaos hubs. InfoQ interviewed Umasankar Mukkara, CEO of ChaosNative and co-creator and maintainer of Litmus engineering platform.

Feynman Zhou
on Sep 24, 2021
DevOps

Gremlin Adds Automated Service Discovery for Targeting Chaos Experiments

Gremlin, a chaos engineering platform, recently announced automated service discovery. This new feature will auto discover services running within dynamic environments. These services are then available to target for chaos experiments. Gremlin has also added role based access control for their API keys.

Matt Campbell
on May 03, 2021
DevOps

Cheryl Hung on Trends in Cloud Native and DevOps for 2021

In a recent keynote for The DEVOPS Conference, Cheryl Hung, VP ecosystem for the Cloud Native Computing Foundation (CNCF), shared her top 10 predictions for cloud native in the upcoming year. This includes improvements in cross cloud support, growth in GitOps and chaos engineering practices, and an increase in the adoption of FinOps.

Matt Campbell
on Apr 11, 2021
DevOps

InfoQ Live March 16: Explore Ways of Reducing Uncertainty in Software Delivery

InfoQ Live, the one-day virtual event for software engineers and architects, returns on March 16th with a new edition, this time focusing on ways to reduce the uncertainty of your software development cycle.

Adelina Turcu
on Mar 04, 2021
DevOps

Gremlin Aims to Reduce Kubernetes Noisy Neighbours through Chaos Engineering

Gremlin has released enhancements to its Chaos Engineering platform aimed at DevOps engineers interested in future-proofing Kubernetes clusters by isolating "noisy neighbours". On Kubernetes, the noisy neighbour issue occurs when multiple applications sharing a Kubernetes cluster compete for resources leading to degraded performance.

Rupert Field
on Mar 02, 2021
DevOps

Gremlin Releases State of Chaos Engineering 2021 Report

Gremlin released their State of Chaos Engineering 2021 report based on a community survey and their own product data. The key findings include a positive correlation between running chaos engineering experiments and increased availability.

Hrishikesh Barua
on Feb 20, 2021
DevOps

AWS Announces Chaos Engineering as a Service Offering

AWS has announced the upcoming release of their chaos engineering as a service offering. The Fault Injection Service (FIS) will provide fully-managed chaos experiments across a number of AWS services. The service includes pre-built templates that generate disruptions mimicking common real-world events. It can be integrated into CI pipelines via API.

Matt Campbell
on Dec 21, 2020
DevOps

Chaos Engineering on Kubernetes : Chaos Mesh Generally Available with v1.0

The Chaos Mesh team announced the general availability (GA) of Chaos Mesh 1.0 after it was accepted as a CNCF sandbox project in July 2020. Chaos Mesh is a tool to perform chaos engineering experiments on Kubernetes applications.

Hrishikesh Barua
on Oct 18, 2020

Newer News

Older News

InfoQ Software Architects' Newsletter

News