John Allspaw discusses pitfalls to be avoided while troubleshooting failed systems, comparing web operations at scale with practices in aviation and nuclear power industries.
Blake Mizerany presents various ways that can lead to system failure in distributed systems and how to recover using Doozer, a highly available, consistent data store.
Justin Sheehy talks about failure and the need to prepare for it, giving some real life examples along with techniques implemented in Riak to make it resilient to faults.
Robert Myers talks about the role played by failure in Agile development, sharing a number of Lean and Agile practices helping to embrace failure and showing how to interpret the feedback received.
Herbjörn Wilhelmsen discusses the reasons why an SOA project failed while trying to reuse existing resources, and how it succeeded later starting from the same business case with reuse in mind.
Justin Sheehy explains the principles behind concurrent distributed systems: no global state, no ACID but rather BASE, no RPC but protocols over APIs, prepare for failure, degradation, measurement.
Michael Nygard encourages us to have a failure oriented mindset. He presents many anti-patterns leading to systems instability and failure, accompanied by design patterns that should be used instead.
Henrik Kniberg talks about 10 possible reasons to fail while doing Scrum and XP. Maybe the team does not have a definition of what Done means to them, or they don't know what their velocity is.
David Douglas and Robin Dymond discuss about companies adopting Agile, but don't go all the way, resulting in failure and rejection of it, and predictably having a negative impact on Agile's future.