Nov 18th, 2019, chaos engineering platform Gremlin released native Kubernetes support for identifying, targeting, and experimenting on Kubernetes objects in order to proactively identify service weaknesses. Automatic discovery enables running experiments on Kubernetes infrastructure as it orchestrates containers across hosts.
Gremlin is a platform service that provides chaos experiments for Kubernetes, cloud environments, bare metal, and serverless. Chaos engineering is a testing approach to investigate how complex systems perform under stress, with the goal of identifying failures before they occur. Gremlin provides a framework of attacks that inject faults into a system, such as limiting a critical resource or simulating an unreliable network. Attacks can be grouped as scenarios, which run and record the results of the attacks.
Gremlin's native Kubernetes support provides testing of Kubernetes objects via the Gremlin user interface or API. Prior to this release, running Gremlin attacks on a service in Kubernetes required targeting containers for that service. Since Kubernetes abstracts container orchestration and is regularly destroying and creating containers, trying to target specific service containers could create challenges. Gremlin now allows chaos testing Kubernetes applications to be specified at the service level instead of the container level. According to Gremlin CTO and co-founder Matthew Fornaciari:
Our goal is to provide SRE and DevOps teams that are building and deploying modern applications with the tools and processes necessary to understand how their systems handle failure, before that failure has the chance to impact customers and business.
Gremlin's Kubernetes testing framework automates the process of identifying and targeting Kubernetes primitives such as nodes and pods. Users configuring a network attack can also select which of the Kubernetes service traffic they want to disrupt.
To enable Kubernetes testing, users must first upgrade their Gremlin client via the helm chart. Once the Gremlin client is upgraded, attacks are created via the UI, where a new Kubernetes option is available. A list of Kubernetes clusters and namespaces are provided for filtering to the Kubernetes object to be tested. The objects are broken down by Deployment, DaemonSet, ReplicaSet, StatefulSet, and Pod to allow precise targeting of chaos experiments. As objects are selected, Gremlin will provide a map of the cluster, highlighting which areas will be impacted by the experiment.
Gremlin Attack UI from the Gremlin blog.
Once the experiment has been defined, Gremlin will target the underlying containers. As the test is run, the effected containers will be included in the test results report, grouped by the Kubernetes object they belong to. Container details and logs are provided as part of the results of the experiment.
Chaos engineering was popularized with Netflix's Chaos Monkey, which randomly terminated instances in production to test how services handle failure. A community-driven Kubernetes implementation of the tool, kube-monkey, similarly deletes Kubernetes pods in a cluster. The open source tool Litmus provides a more configurable testing suite comparable to Gremlin's framework. Litmus enables users to run test suites, capture logs, generate reports and perform chaos tests in Kubernetes environments. Litmus can also be added to CI/CD pipelines as part of an end-to-end testing approach.
The Gremlin integration with Kubernetes is available for Free and Pro users. Targeting Kubernetes within a scenario is not yet available. For more information on using Kubernetes with Gremlin, review the Gremlin docs.