InfoQ Homepage Resilience Content on InfoQ
-
Resilient Functional Service Design
Uwe Friedrichsen explores how much functional design affects the overall robustness of a solution to learn how to deliver a better "resilient functional service design".
-
The Walking Dead - A Survival Guide to Resilient Reactive Applications
Michael Nitschinger discusses how to build event-driven applications that are resilient from the bottom up, allowing to deal with remote services that are failing, slow or misbehaving.
-
When Streams Fail: Kafka Off the Shore
Anton Gorshkov discusses how to evaluate and architect a resilient streaming platform, focusing on Kafka and Spark streaming and sharing his experience of using them to process financial transactions.
-
From Microliths to Microsystems
Jonas Boner explores microservices from first principles, distilling their essence and putting them in their true context: distributed systems based on reactive principles.
-
Building and Trusting a Cloud Bank
Greg Hawkins discusses how Starling Bank, part of the new movement in FinTech challenger banks, is innovating while addressing the need for resilience in a world where failure is everywhere.
-
Automating Chaos Experiments in Production
Ali Basiri discusses the motivation behind ChAP (Chaos Automation Platform), how they implemented it, and how Netflix service teams are using it to identify systemic weaknesses.
-
Applying Failure Testing Research @Netflix
Kolton Andrus and Peter Alvaro present how a “big idea” -- lineage-driven fault injection -- evolved from a theoretical model into an automated failure testing service at Netflix.
-
Architecting for Failure in a Containerized World
Tom Faulhaber discusses the new container-based toolbox for building systems that are robust in the face of failures, how to recover from failure and how the tools can be used to best effect.
-
Stranger Things: The Forces that Disrupt Netflix
Haley Tucker discusses how other systems may affect Netflix' services, strategies to protect their systems and make sure they won't fail even if things go wrong.
-
WebSockets, Reactive APIs and Microservices
Todd Montgomery investigates whether WebSockets, HTTP/2, Reactive Streams and microservices can deliver the high scalability, resiliency, and ease of development promised.
-
0 to 100 days - Running DRTs at Dropbox
Thomissa Comellas shares her experiences developing and rolling out new Disaster Recovery Testing techniques at Dropbox. Tammy Butow shares how her team runs DRTs and has implemented the techniques.
-
Chaos Kong - Endowing Netflix with Antifragility
Luke Kosewski describes Flow, how it adds value to a microservice architecture, what preconditions must be met for such a recovery mechanism to succeed, and tells the story of a 2015 Q4 outage.