Ready for InfoQ 3.0? Try the new design and let us know what you think!

The InfoQ eMag: Chaos Engineering

| by InfoQ Follow 12 Followers , reviewed by Nora Jones Follow 8 Followers on Dec 06, 2018

About the Author is facilitating the spread of knowledge and innovation in professional software development. InfoQ content is currently published in English, Chinese, Japanese and Brazilian Portuguese. With a readership base of over 1,400,000 unique visitors per month reading content from 100 locally-based editors across the globe, we continue to build localized communities.

As our systems scale, we need more complexity, which inherently increases exponentially over time. The need for understanding and navigating this complexity also increases. Chaos engineering is a discipline that allows us to refine, recalibrate, and navigate the understanding of our systems through intentional and careful experimentation in the form of failure injection. This greater understanding ultimately leads to a better experience for our customers and better outcomes for our businesses.

At Netflix, we’ve been embracing chaos engineering since Chaos Monkey was born in 2011. It has gone through several iterations and tools that eventually evolved into the Failure Injection Testing (FIT) platform and, ultimately, ChAP (a platform for safely automating and running chaos experiments in production) through the efforts of many amazing engineers. We’ve taken the opportunity to outline why this has been so beneficial for the business in a separate IEEE article titled “The Business Case for Chaos Engineering” and a free e-book from O’Reilly here.

One thing I’ve noticed in my experiences with chaos engineering at various companies is that each approaches it differently based upon the key business objectives, the architectural decisions, and behaviors and motivations of the people that make up the organization.

I hope that you enjoy the eMag we have created together and that it inspires you to dig deeper into your systems, question your mental models, and use chaos engineering to build confidence in your system’s behaviors under turbulent conditions. Happy reading!

Free download

Please choose

To receive this eMag please answer the following questions:

Before you download this book...

Waiting for a major outage to occur isn’t an option. Run proactive Chaos Experiments to verify that your system can withstand failure—and to fix it if it doesn’t.

I'd like to learn how Gremlin can help me build resiliency through orchestrated chaos.

Note: By checking the box you grant InfoQ permission to share your contact info with this sponsor.

The InfoQ eMag: Chaos Engineering includes:

  • LinkedIn’s Waterbear: Influencing resilient applications - Michael Kehoe describes LinkedIn’s per-application resilience engineering effort called Project Waterbear and the corresponding suite of tools they built for running chaos-engineering experiments
  • UIs: Value of the Visual in Chaos Engineering - Patrick Higgins explores the importance of UI for chaos engineering tools, both as a teaching mechanism, and also as a way to provide engineers with strong safety mechanisms to allow engineers to, for example, halt experiments that might get out of hand.
  • Using Chaos Engineering to Secure Distributed Systems - Aaron Rinehart explores how chaos engineering can be applied to security testing in distributed systems, arguing that it it differs from both red/purple-team security testing and penetration testing in its goals, purpose, and methodology.
  • Recalibrating Mental Models Through Design of Chaos Experiments - John Allspaw explores how designing and running chaos experiments challenges integrant assumptions made by software engineers, providing a mechanism for the re-calibration of mental model the engineers have built up about two the system works.

InfoQ eMags are professionally designed, downloadable collections of popular InfoQ content - articles, interviews, presentations, and research - covering the latest software development technologies, trends, and topics.