BT

Facilitating the spread of knowledge and innovation in professional software development

Contribute

Topics

Choose your language

InfoQ Homepage Chaos Engineering Content on InfoQ

  • How to Integrate Infosec and DevOps Using Chaos Engineering

    Kelly Shortridge from Capsule8 talked at the Velocity conference in Berlin about how using chaos engineering can help to integrate Infosec within a DevOps culture. Shortridge discussed how distributed, immutable, and ephemeral infrastructure, or the D.I.E. model, is an organizationally friendly way to building security by design. With this model, users can continuously raise the cost of the attack

  • Gremlin Introduces Scenarios, Enabling Real-World Chaos Experiments

    The Gremlin team announced the addition of Scenarios that allow for simulation of real-world outages. Scenarios allow for planning and tracking complex chaos experiments that more closely mimic a real-world outages. The release includes prepared Scenarios that can be run out of the box or used as a starting template to build custom incidents.

  • How Did Things Go Right? Learning More from Incidents at Netflix: Ryan Kitchens at QCon New York

    At QCon New York, Ryan Kitchens presented “How Did Things Go Right? Learning More from Incidents”. Key takeaways from the talk included: recovery is better than prevention; an incident occurs when there is a “perfect storm” of events -- there is no root cause; “stop reporting on the nines”, as user happiness is more important; and there is value in learning how things go right.

  • Solo.io Announces Service Mesh Hub and Chaos Engineering Tool

    Solo.io, a cloud native software company, launched the first industry service mesh hub. The hub provides resources to help users adopt service mesh technology in hybrid and multi-cloud environments and features tools such as Istio, Linkerd, Envoy, AWS App Mesh, and HashiCorp Consul.

  • Summary of Chaos Community Day v4.0: Resilience, Observability, and Gamedays

    Earlier in the year, the fourth edition of “Chaos Community Day” was held at Work-Bench in New York City. Key takeaways from the day included: the topic of chaos engineering draws heavily from other domains, which software engineers can also learn from; understanding systems, and communicating and exchanging the related mental models, is vital for establishing resilience.

  • Chaos Engineering Kubernetes with the Litmus Framework

    Litmus is an open source chaos engineering framework for Kubernetes environments running stateful applications. Created by MayaData, Litmus enables users to run test suites, capture logs, generate reports, and perform chaos experiments.

  • QCon NY (Jun 24-28): New Talks, a Focus on the Skills That Matter & Why You Should Join Us This Year

    In the recent Stack Overflow 9th annual survey of over 90,000 software developers, we learned that non-development work remains a productivity challenge for software managers and leaders. At QCon New York, the conference for senior software developers, we have many sessions to help you learn how others have overcome those challenges.

  • Mature Microservices and How to Operate Them: QCon London Q&A

    Microservices is an architectural approach to keep systems decoupled for releasing many changes a day, said Sarah Wells in her keynote at QCon London 2019. To build resilient and maintainable systems you need things like load balancing across healthy nodes, backoff and retry, and persistence or fanning out of requests via queues. The best way to know whether your system is resilient is to test it.

  • Amplifying Sources of Resilience: John Allspaw at QCon London

    At QCon London John Allspaw presented “Amplifying Sources of Resilience: What Research Says”. Key takeaways from the talk included: that resilience is something a system does, not what a system has; creating and sustaining “adaptive capacity” within an organisation is resilient action; and learning about how people cope with surprise is the path to finding sources of resilience.

  • Gremlin Announces Free Tier for Their Chaos Experimentation Platform

    Gremlin has announced “Gremlin Free”, which provides the ability to run chaos engineering experiments on a free tier of their failure-as-a-service SaaS platform. The current version of the free tier allows the execution of shutdown and CPU attacks on hosts or containers, which can be controlled via a simple web-based user interface, API or CLI.

  • Chaos Engineering Observability: Q&A with Russ Miles

    In a new O’Reilly report, “Chaos Engineering Observability: Bringing Chaos Experiments into System Observability”, the author, Russ Miles, explores why he believes the topics of observability and chaos engineering “go hand in hand”. He argues that as engineers begin to run chaos experiments, they will need to be able to ask many questions about the underlying system being experimented on.

  • Building Production-Ready Applications: Michael Kehoe Shares Lessons Learned from LinkedIn

    At QCon San Francisco, Michael Kehoe presented “Building Production-Ready Applications”. Drawing on his experience with site reliability engineering (SRE), he introduced the tenets of “production-readiness” that all engineers across the organisation should focus on as: stability and reliability; scalability and performance; fault tolerance and disaster recovery; monitoring; and documentation.

  • An Evolution of Chaos Experimentation: Kolton Andrus at ChaosConf 2018

    At the inaugural ChaosConf, held in San Francisco, USA, Kolton Andrus presented an evolution of chaos experimentation over the past eight years. He argued that the human and organisational aspects of dealing with failure should not be ignored, and also suggested that tooling should support application- and request-level targeting of failure injection tests in order to minimise the blast radius.

  • Gremlin Releases Application Level Fault Injection (ALFI) Platform for Targeted Chaos Experiments

    Gremlin Inc has released their second product offering in the “Failure-as-a-Service” domain– Application-Level Fault Injection (ALFI). Building upon their initial platform that facilitated engineers in creating and running chaos experiments at the infrastructure level, ALFI enables failure injection at the application level via a native language library.

  • Russ Miles: Ignored Architects and Chaos Engineering

    At the recent Event-Driven Microservices Conference in Amsterdam, Russ Miles claimed that the biggest challenge for an architect is that you get ignored. You have great ideas like event-driven microservices, but the reaction too often is that it sounds good, but that it’s overly complicated for the needs at hand.

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.