At the recent Ignite, Microsoft announced the public preview of Azure Chaos Studio, a fully-managed experimentation service to help customers track, measure, and mitigate faults with controlled chaos engineering to improve the resilience of their cloud applications.
With Azure Chaos Studio, users can apply chaos engineering - a method of experimenting with controlled fault injection against their applications to help them measure, understand and improve resilience against real-world incidents, such as region outages or memory leaks in an application. Moreover, chaos engineering is getting more adoption according to a recent InfoQ DevOps and Cloud Trends report – and also quickly becoming a standard way to test complex systems before real-world outages put those systems to the test.
Azure Chaos Studio lets customers model and deliberately introduce faults that simulate real-world outages and validate how their application performs under those scenarios. Internally the company uses the same capabilities as the service now offers for customers. For example, in a Microsoft Tech Community blog post, John Engel-Kemnetz, program manager for Microsoft Azure Chaos Studio, points out the need for the service:
Whether you are developing a new application that will be hosted on Azure, migrating an existing application to Azure, or operating an application that already runs on Azure, it is important to improve your application's ability to handle and recover from disruptions that can negatively impact your customers experience and erode their trust in your business or mission. To avoid these negative consequences, you need to validate that your application responds effectively to disruptions that could be caused by a service you depend on, disruptions caused by a failure in the service itself, or even disruptions to incident response tooling and processes.
Source: https://docs.microsoft.com/en-us/azure/chaos-studio/chaos-studio-tutorial-service-direct
Microsoft's Azure Chaos Studio is not the first tool supporting chaos engineering principles. In 2012, Netflix's engineering team launched Chaos Monkey, a widely used tool next to others like Gremlin, Chaos Mesh, Litmus, and ChaosBlade that came later. And AWS, a public cloud competitor of Microsoft, launched a service similar to Azure Chaos Studio with Amazon Fault Injector.
Tom Morgan, a Microsoft Teams Platform developer and Microsoft MVP, stated in his blog post the benefit of having chaos engineering as a service on Azure:
I think this is great for any Azure-based solution, including Teams Apps. Plenty of Teams Dev solutions use multiple Azure technologies such as Web Apps and Table Storage which need to work together. Furthermore, interruptions to any component can often have unexpected side effects, so testing out how solutions perform in less-than-ideal conditions will result in more resilient solutions and able to cope with issues in the real world.
Azure Chaos Studio is temporarily provided free of charge, and from April 4th, 2022, customers will be pay-as-you-go based on experiment execution according to the pricing details page. Further details and guidance on the service are available on the documentation landing page.