A mistake took down more S3 servers than it should, including two subsystems essential to S3 operation. This resulted in S3 failure, affecting the S3 service and other services depending on it. Normal functioning was restored in about four hours.
Dead code needs to be found and removed; leaving dead code in is an obstacle to programmer understanding and action, and there's the risk that the code is awakened which can cause significant problems. Deleting dead code is not a technical problem; it is a problem of mindset and culture.
Through improv games, Ted DesMaisons and Lisa Rowland shared three hacks for building a better life - embracing failure, saying "yes," and sharing control.
The Cloud, infrastructure as code, federated architectures with APIs, and anti-fragile systems: these are technologies for developing software systems that are rapidly coming into focus, claimed Mary Poppendieck. Systems are moving towards the cloud, and APIs are replacing central shared databases and enable the internet of things. We need to develop anti-fragile systems which embrace failure.
The impostor syndrome refers to people who fear being exposed as a "fraud". They think that they do not belong where they are, don't deserve the success they have achieved, and are not as smart as other people think. According to Agile Coach Gitte Klitgaard, many high-achieving people suffer from the impostor syndrome. It hinders people in their work and stops them from following their dreams.
Spotify wants to be really good at getting it wrong quickly and optimized for experimentation, said Marcus Frödin, director of engineering at Spotify. At Spark the Change London 2016 he presented a concept to learn from mistakes and breed success and gave examples of failures at Spotify and how they learned from them.
Round up of the talks at DevOps Days Kiel's second day.
At QCon London 2016 Peter Alvaro and Kolton Andrus shared lessons learned from a fruitful collaboration between academia and industry, which ultimately resulted in the creation of a novel method for automating failure injection testing at Netflix. Core learnings included: work backwards from what you know; meet in the middle; and adapt the theory to the reality.
At the microXchg 2016 conference, held in Berlin, Germany, Richard Rodger presented “Surviving Microservices”, a practical guide for developers wanting to keep their microservices architectures ‘healthy and performant’. Key topics discussed in the talk included the benefits of message-oriented systems, pattern matching with inter-service communication, dealing with failure, and Seneca.js.
Failure testing should be a critical part of running your microservices, Kolton Andrus stated in his presentation at the recent Microservices Practitioner Summit. Verifying that your services behave as you expect is something you should do to prevent outages.
InfoQ interviewed Stephen Carver about how bringing in procedures and rules often doesn't help to prevent problems, enabling communication between engineers working in different companies, taking learnings from failure to a next level to prevent similar problems, and what engineers can do if they want to influence decisions on developing and releasing products.
A coachRetreat is a "safe to fail" learning platform where participants can try different approaches to coaching. In a coachRetreat participants explore the way that people interact in a given situation and can learn to view a situation from different perspectives to improve their coaching skills. An interview with Oana Juncu, Elad Sofer and Yves Hanoulle.
Russ Olsen did the opening keynote titled "To the Moon" at the GOTO Berlin 2015 conference. InfoQ interviewed him about drawbacks of doing all the things at the same time to meet the deadline, learning from things that went wrong and from things that went right, how little things can kill you in software development, and how to focus and deal with details when doing complex work.
In innovation the mantra "fail fast" is often used to explain that people should quickly try out ideas and then learn from the things that fail to develop new products and services. Some people challenged the need for failure and have come up with alternative approaches for effective innovation.
Autonomy is one of the core guiding principles at Spotify. It enables employees to make decisions as close to the works that is being done as possible. At the Agile Greece Summit 2015 Kristian Lindwall and Cliff Hazell from Spotify explained why autonomy is at the heart of agility.