BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Gremlin Introduces Scenarios, Enabling Real-World Chaos Experiments

Gremlin Introduces Scenarios, Enabling Real-World Chaos Experiments

Bookmarks

The Gremlin team has announced the addition of Scenarios that allow for simulation of real-world outages. Scenarios allow for the planning and tracking of complex chaos experiments that more closely mimic actual live incidents. The release also included prepared Recommended Scenarios that can be run out of the box or used as a starting template to build custom incidents.

In February the Gremlin team released Gremlin Free, which allowed users to run chaos engineering experiments on a free tier of their SaaS platform. Chaos engineering involves injecting errors into distributed systems in a controlled manner in order to test how the system responds. Since this release, Kolton Andrus, CEO and co-founder of Gremlin, shared that thousands of new customers have started leveraging their platform. However, as Andrus continues:

Many organizations are still struggling to decide which experiments to run, in order to avoid major downtime and outages. That’s exactly why we’ve added templates of real-world outages into our application. Now Gremlin users can easily simulate major failures with a couple of clicks, and validate their systems’ ability to withstand them.

The release included six recommended scenarios that are based off of real-world outages. These can be run as is or used as a starting point for customized scenarios. The six included scenarios are:

  1. Traffic spikes: this scenario can progressively add CPU load to selected hosts. This load can range from 10-100% CPU. This scenario is designed to mimic high volume traffic spikes, such as that experienced by retail sites during Black Friday.
  2. Unreliable networks: this scenario tests how the system responds when API calls have higher than normal response times.
  3. Unavailable dependency: this scenario mimics one or more microservices being unavailable.
  4. Region evacuation: this scenario permits the testing of a full region failure to validate that traffic flow is not impacted.
  5. Host failure: with this scenario a percentage of hosts can be set to shut down in order to validate that the service recovers as expected.
  6. DNS outage: this scenario can have the primary DNS provider fail in order to validate successfully failing over to a secondary provider.

With this release you can also create custom chaos experiments. These can be extended from the predefined scenarios listed above or built from scratch. Scenarios allow for the linking of multiple attacks together, as well as adjustments to blast radius and magnitude of the failure injection in order to better ascertain how the system will handle more complex scenarios.

Gremlin UI allowing creation of an attack

 

Gremlin UI allowing creation of a custom attack as part of a scenario (credit: Gremlin)

Scenarios also support the recording of a hypothesis. This allows for tracking what the expected outcome of the scenario should be when it is run. The results and expectations can also be recorded once the scenario has been executed.

Creation of a hypothesis as part of a scenario
Creation of a hypothesis as part of a scenario (credit: Gremlin)

Scenarios, including the recommended scenarios, are available in both the Free and Pro tiers of Gremlin. More information on scenarios can be found in the Gremlin docs or via the #support channel on the Chaos Community Slack.

Rate this Article

Adoption
Style

BT