Architecting for Failure at the Guardian.co.uk
Michael Brunton-Spall talks about various types of system failure that can happen, sharing the lessons learned at the Guardian and measures taken to prevent and mitigate failure.
Michael Brunton-Spall talks about various types of system failure that can happen, sharing the lessons learned at the Guardian and measures taken to prevent and mitigate failure.
Agile adoption and transformation is sometimes effective, and sometimes not. Is there a common thread to the failures? Does fear have anything to do with it? And what can we expect if we start an agile adoption initiative in an environment that is full of fear?
Usually failures result in anger, frustration and playing the blame game. However, failures are wasted if there is no learning from them. How can Agile teams make failures beautiful?
Philippe Kruchten described the Agile movement as "The agile movement is in some ways a bit like a teenager: very self-conscious, checking constantly its appearance in a mirror, accepting few criticisms..." and shared a list of twenty elephants in the room - uncomfortable issues that are ignored on purpose. The first of these unmentionables is that commercial interests are censoring failures.
Amazon has published a detailed report on the service failure plaguing one availability zone in the US East Region. The online media is full with analysis, commentaries and lessons to be learned from the event.
On December 22nd, 1600 GMT, the Skype services started to become unavailable, in the beginning for a small part of the users, then for more and more, until the network was down for about 24 hours. A week later, Lars Rabbe, CIO at Skype, explained what happened in a post-mortem analysis of the outage.
John Allspaw discusses pitfalls to be avoided while troubleshooting failed systems, comparing web operations at scale with practices in aviation and nuclear power industries.
Blake Mizerany presents various ways that can lead to system failure in distributed systems and how to recover using Doozer, a highly available, consistent data store.
Justin Sheehy talks about failure and the need to prepare for it, giving some real life examples along with techniques implemented in Riak to make it resilient to faults.
Robert Myers talks about the role played by failure in Agile development, sharing a number of Lean and Agile practices helping to embrace failure and showing how to interpret the feedback received.
Herbjörn Wilhelmsen discusses the reasons why an SOA project failed while trying to reuse existing resources, and how it succeeded later starting from the same business case with reuse in mind.