InfoQ Homepage Resilience Content on InfoQ
-
Fault Tolerance Made Easy
Uwe Friedrichsen discusses implementing resilient software design patterns (code included) and improving those patterns to achieve robustness and becoming a resilient software developer.
-
From Instability to Resilience: The Story of a Web Site
Richard Campbell shares his experiences evolving a web site from ordinary to resilient, the triage process, the quick-and-dirty solutions as well as the work to bring the site to true resiliency.
-
Principles of Reliable Communication & Shared State
Andy Piper describes some fundamentals of communicating reliably in an unreliable world and communication techniques used to build distributed data structures that can tolerate failures.
-
Going Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Jonas Bonér discusses how the four traits of reactive apps -Event-Driven, Scalable, Resilient and Responsive- impact app design, how they interact, and their supporting technologies and techniques.
-
Failure: The Good Parts
Viktor Klang keynotes on the imminence and the need to prepare for failure along with several ways of managing failure in case it happens.
-
How Netflix Architects for Survival
Jeremy Edberg discusses how Netflix designs their systems in order to survive outages, network latency and random instance failure.
-
Partitions for Everyone!
Kyle Kingsbury discusses some of the limitations found in distributed systems and the way some of them behave under partitioning.
-
Resiliency through Failure - Netflix's Approach to Extreme Availability in the Cloud
Ariel Tseitlin discusses Netflix' failure-based suite of tools, collectively called the Simian Army, used to improve resiliency and maintain the cloud environment.
-
Systems that Run Forever Self-heal and Scale
Joe Armstrong outlines the architectural principles needed for building scalable fault-tolerant systems built from small isolated parallel components which communicate though well-defined protocols.