As our systems scale, we need more complexity, which inherently increases exponentially over time. The need for understanding and navigating this complexity also increases. Chaos engineering is a discipline that allows us to refine, recalibrate, and navigate the understanding of our systems through intentional and careful experimentation in the form of failure injection. This greater understanding ultimately leads to a better experience for our customers and better outcomes for our businesses.
At Netflix, we’ve been embracing chaos engineering since Chaos Monkey was born in 2011. It has gone through several iterations and tools that eventually evolved into the Failure Injection Testing (FIT) platform and, ultimately, ChAP (a platform for safely automating and running chaos experiments in production) through the efforts of many amazing engineers. We’ve taken the opportunity to outline why this has been so beneficial for the business in a separate IEEE article titled “The Business Case for Chaos Engineering” and a free e-book from O’Reilly here.
One thing I’ve noticed in my experiences with chaos engineering at various companies is that each approaches it differently based upon the key business objectives, the architectural decisions, and behaviors and motivations of the people that make up the organization.
I hope that you enjoy the eMag we have created together and that it inspires you to dig deeper into your systems, question your mental models, and use chaos engineering to build confidence in your system’s behaviors under turbulent conditions. Happy reading!
Free download
The InfoQ eMag: Chaos Engineering includes:
- LinkedIn’s Waterbear: Influencing resilient applications - Michael Kehoe describes LinkedIn’s per-application resilience engineering effort called Project Waterbear and the corresponding suite of tools they built for running chaos-engineering experiments
- UIs: Value of the Visual in Chaos Engineering - Patrick Higgins explores the importance of UI for chaos engineering tools, both as a teaching mechanism, and also as a way to provide engineers with strong safety mechanisms to allow engineers to, for example, halt experiments that might get out of hand.
- Using Chaos Engineering to Secure Distributed Systems - Aaron Rinehart explores how chaos engineering can be applied to security testing in distributed systems, arguing that it differs from both red/purple-team security testing and penetration testing in its goals, purpose, and methodology.
- Recalibrating Mental Models Through Design of Chaos Experiments - John Allspaw explores how designing and running chaos experiments challenges integrant assumptions made by software engineers, providing a mechanism for the re-calibration of the mental model the engineers have built up about how the system works.
InfoQ eMags are professionally designed, downloadable collections of popular InfoQ content - articles, interviews, presentations, and research - covering the latest software development technologies, trends, and topics.