BT

InfoQ Homepage Resilience Content on InfoQ

  • Netflix Engineer Lorin Hochstein on Chaos Monkey 2.0

    Netflix made waves when it initially announced Chaos Monkey, a tool that would terminate normally healthy VM instances in production. The goal was to embrace failure and thereby increase resiliency. Rags Srinivas caught up with Lorin Hochstein at Netflix regarding the recent upgrade to Chaos Monkey.

  • Chaos Monkey 2.0 Runs via Spinnaker

    Netflix has recently made available the source code of the Chaos Monkey 2.0. The latest iteration of the resilience tool is fully integrated with Spinnaker and event tracking systems, but the SSH support has been removed.

  • DevOps Days Kiel Day 2

    Round up of the talks at DevOps Days Kiel's second day.

  • Google Kick-Starts Git Ketch: A Fault-Tolerant Git Management System

    Although development has only started, Google has announced their first commits of Git Ketch, a multi-master Git management system that replicates information across multiple Git servers for resilience and scalability. The changes are based on JGit, a Java-based Git server, although other Git servers may be part of the multi-master cluster.

  • Microsoft Makes Available Their Platform for Building Microservices

    Microsoft has announced and made available the preview of Azure Service Fabric (ASF), a cloud platform including a runtime and lifecycle management tools for creating, deploying, running and managing microservices. ASF microservices can be deployed on Azure or on-premises on Windows Server private or hosted clouds. Support for Linux is to come in the future.

  • Anti-patterns for Handling Failure

    Oliver Hankeln shares the anti-patterns he found for handling failure in organizations: hiding mistakes, engaging in blame game, the arc of escalation and cowardice. He then suggests corrective actions for each of them.

  • How Netflix Handled the Reboot of 218 Cassandra Nodes

    Amazon performed a major maintenance update at the end of September in order to patch a security vulnerability in a Xen hypervisor affecting about 10% of their global fleet of cloud servers. This update involved the rebooting of those servers, with consequences for AWS users and the services they provide, including one of their largest clients, Netflix.

  • TypeSafe's Kevin Webber: Actor-based Concurrency for Reactive Systems

    In a recent article on Medium, TypeSafe's Kevin Webber argues that reactive programming "isn’t just another trend but rather the paradigm for modern software developers to learn" since it helps them to build systems that are responsive, resilient, and scalable. He also suggests that actor-based concurrency is the most convenient foundations for a reactive system.

  • Refreshed AWS Trusted Advisor Offers Several Free Checks

    Amazon Web Services (AWS) has recently integrated the AWS Trusted Advisor into the AWS Management Console and made four security and service limit checks available at no charge. Additional checks from the security, performance, fault tolerance and cost optimization categories remain part of their Business and Enterprise support tiers.

  • Testing Resiliency at PagerDuty Without a Simian Army

    Doug Barth, from PagerDuty, talked at DevOps Days London about their approach to start resiliency testing their systems without dedicating a lot of automation effort upfront. The goal was to quickly start learning about failure points and openly discuss how to fix them with only one hour per week of effort.

  • Amazon Web Services Stability and the September 13th US East 1 Outage

    Amazon Web Services (AWS) suffered another outage of its US East 1 region during the morning of Friday 13th September. A number of popular applications such as Heroku, Github and CMSWire were disrupted along with many other customers in Amazon’s largest, oldest and busiest location.

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.