BT
x Share your thoughts on trends and content!

Chaos Testing of Microservices

by Jan Stenberg on Mar 16, 2016 | NOTICE: The next QCon is in London, March 6-10, 2017. Join us!

The world is naturally chaotic, and we should both plan for and test that our systems can handle this chaos, Rachel Reese claimed at the recent QCon London conference describing how Jet, an e-commerce company launched in July 2015, works with microservices and chaos engineering.

Reese emphasizes how extremely important it is to test the interaction in your environment. Even though all components have been tested it doesn’t mean the interactions between them are solid and they can be used together in production, all these have to be tested. She calls Jet a “the right tool for the right job” company, and for her chaos testing is one of the right tools.

Reese defines a microservice as an application of the Single Responsibility Principle (SRP) but at the service level and, because of their functional way of looking at microservices, that it has an input and produces an output. The benefits she sees using microservices include simplified scalability, independent ability to release, and a more even distribution of complexity. Jet runs with somewhere between 400 and 1,000 microservices spread over 10-15 teams, mainly written in F# (a functional-first programming language).

Reese notes that chaos engineering is not about wreaking havoc with the code for fun, instead she defines it as:

Controlled experiments on a distributed system that help to build confidence in the system’s ability to tolerate the inevitable failures.

Referring to Principles of Chaos Reese’s defines four steps in chaos engineering:

  1. Define “normal” (the normal state of the system).
  2. Assume “normal” will continue in both a control group and an experimental group.
  3. Introduce chaos: servers that crash, hard drives that malfunction, network connections that are severed, etc.
  4. Look for a difference in behaviour between the control group and the experimental group.

More specifically this means:

  • Build a hypothesis, defining normal behaviour and state of the system like throughput, latency, etc.
  • Vary real-world events, spikes in traffic and other things that can make something chaotic.
  • Run experiments in production to guarantee authenticity of the tests.
  • Automate experiments to run continuously.

The benefits of chaos engineering that Reese has found include:

  • Outages occur due to testing during daytime, instead of fixing problems at 3 a.m.
  • Engineers start to design for failure.
  • It makes systems healthier, by preventing outages happening later on.

Looking at their experiences Reese notes that they are not yet testing in production. As a start-up company their primary objectives has been launching and getting everything right. Right now they are testing in QA randomly at all hours during daytime.

One of their most “interesting” disasters happened a few months ago when their manual testers noticed that their search engine was down, resulting in cascading issues downstream. The reason for this failure was that the chaos testing has restarted the search engine in the wrong way. Due to this single failure they were able to find 5-6 different issues.

Reese concludes by claiming:

If availability matters, you should be testing for it.

Reese’s presentation is already available for QCon attendees, and will later be available for readers of InfoQ.

Rate this Article

Relevance
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss
General Feedback
Bugs
Advertising
Editorial
Marketing
InfoQ.com and all content copyright © 2006-2016 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.