InfoQ Homepage Failure Content on InfoQ
-
Culturing Resiliency with Data: a Taxonomy of Outages
Ranjib Dey overviews the categorization of outages that happened at Uber in the past few years based on root cause types.
-
Failing over without Falling over
Adrian Cockcroft shows how to use System Theoretic Process Analysis (STPA), as advocated by Professor Nancy Leveson’s team at MIT, to analyze failover hazards.
-
Congratulations, You’ve Failed! Continuously Learning through Failed Experiments
Claire Laurence and Sara Elshahawy share the lessons learned embracing experiments and failures, and how those allowed them to be innovative and creative.
-
#FAIL
Kevlin Henney keynotes on some of the failures that people had in various projects and the lessons to be learned from them.
-
Rules in Agile Transformation: 80/20 and “Not Everybody Likes to Dance”
Zbigniew Piecuch discusses why some teams do not manage to master Agile.
-
What Breaks Our Systems: A Taxonomy of Black Swans
Laura Nolan talks about Black Swan events - unforeseen, unanticipated, and catastrophic incidents - that may happen in production and can take the system down.
-
How Did Things Go Right? Learning More from Incidents
Ryan Kitchens describes more rewarding ways to approach incident investigation without overly focusing on failure prevention.
-
How Condé Nast Succeeds by Buildling a Culture that Embraces Failure
Crystal Hirschorn talks about learnings found by building a culture that embraced failure through Chaos Engineering practices, what her teams have learned & adapted for their platforms at Condé Nast.
-
Building Resilient Serverless Systems
John Chapin explains how to use serverless technologies and an infrastructure-as-code approach to architect, build, and operate large-scale systems that are resilient to vendor failures.
-
An Engineer's Guide to a Good Night's Sleep
Nicky Wrightson gives some practical insight into how to handle failure in today's more complex distributed microservice systems.
-
Towards Specifications of Robustness - the Things That Programs do _not_ do
Sophia Drossopoulou discusses holistic specifications", an extension of traditional program specifications that support the expression of robustness properties through spatial and temporal features.
-
It’s a Multi-cloud World, But What About the Data?
Pulkit Chandra, Nikhil Chandrappa demo a microservices application deployed in an active-active setup across two PCF foundation, and show how PCC handles data replication as well as failure.