BT

InfoQ Homepage Failure Content on InfoQ

  • What Resiliency Means at Sportradar

    Pablo Jensen, CTO at Sportradar, talked about practices and procedures in place at Sportradar to ensure their systems meet expected resiliency levels, at this year's QCon London conference. Jensen mentioned how reliability is influenced not only by technical concerns but also organizational structure and governance, client support, and requires on-going effort to continuously improve.

  • Chaos Engineering at Twilio

    The Twilio team describes their foray into Chaos Engineering where they use Gremlin to inject failures into their homegrown queuing system shards to test for automated recovery.

  • How to Measure Continuous Delivery

    Stability and throughput are the things that you can measure when adopting continuous delivery practices. These metrics can help you reduce uncertainty, make better decisions about which practices to amplify or dampen, and steer your continuous delivery adoption process in the right direction.

  • Public Preview of Azure IaaS Disaster Recovery Announced

    In a recent announcement, Microsoft released details about its public preview for Infrastructure-as-a-Service (IaaS) disaster recovery using Azure Site Recovery (ASR). Using the ASR service, organizations can protect IaaS workloads in one Azure region and have it replicated to a different Azure region within a geographical cluster.

  • A Human Error Took Down AWS S3 US-EAST-1

    A mistake took down more S3 servers than it should, including two subsystems essential to S3 operation. This resulted in S3 failure, affecting the S3 service and other services depending on it. Normal functioning was restored in about four hours.

  • Dead Code Must Be Removed

    Dead code needs to be found and removed; leaving dead code in is an obstacle to programmer understanding and action, and there's the risk that the code is awakened which can cause significant problems. Deleting dead code is not a technical problem; it is a problem of mindset and culture.

  • The Improvisor's Code and QConSF

    Through improv games, Ted DesMaisons and Lisa Rowland shared three hacks for building a better life - embracing failure, saying "yes," and sharing control.

  • Technologies for the Future of Software Engineering

    The Cloud, infrastructure as code, federated architectures with APIs, and anti-fragile systems: these are technologies for developing software systems that are rapidly coming into focus, claimed Mary Poppendieck. Systems are moving towards the cloud, and APIs are replacing central shared databases and enable the internet of things. We need to develop anti-fragile systems which embrace failure.

  • Dealing with the Impostor Syndrome

    The impostor syndrome refers to people who fear being exposed as a "fraud". They think that they do not belong where they are, don't deserve the success they have achieved, and are not as smart as other people think. According to Agile Coach Gitte Klitgaard, many high-achieving people suffer from the impostor syndrome. It hinders people in their work and stops them from following their dreams.

  • Spotify Wants To Be Good at Failing

    Spotify wants to be really good at getting it wrong quickly and optimized for experimentation, said Marcus Frödin, director of engineering at Spotify. At Spark the Change London 2016 he presented a concept to learn from mistakes and breed success and gave examples of failures at Spotify and how they learned from them.

  • DevOps Days Kiel Day 2

    Round up of the talks at DevOps Days Kiel's second day.

  • “Monkeys in Labs Coats”: Applied Failure Testing Research at Netflix

    At QCon London 2016 Peter Alvaro and Kolton Andrus shared lessons learned from a fruitful collaboration between academia and industry, which ultimately resulted in the creation of a novel method for automating failure injection testing at Netflix. Core learnings included: work backwards from what you know; meet in the middle; and adapt the theory to the reality.

  • "Surviving Microservices" with Richard Rodger at microXchg: Messages, Pattern Matching and Failure

    At the microXchg 2016 conference, held in Berlin, Germany, Richard Rodger presented “Surviving Microservices”, a practical guide for developers wanting to keep their microservices architectures ‘healthy and performant’. Key topics discussed in the talk included the benefits of message-oriented systems, pattern matching with inter-service communication, dealing with failure, and Seneca.js.

  • Failure Testing of Microservices

    Failure testing should be a critical part of running your microservices, Kolton Andrus stated in his presentation at the recent Microservices Practitioner Summit. Verifying that your services behave as you expect is something you should do to prevent outages.

  • Organisational Learning and the Importance of Real Communication

    InfoQ interviewed Stephen Carver about how bringing in procedures and rules often doesn't help to prevent problems, enabling communication between engineers working in different companies, taking learnings from failure to a next level to prevent similar problems, and what engineers can do if they want to influence decisions on developing and releasing products.

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.