InfoQ Homepage Fault Tolerance Content on InfoQ
-
Runaway Complexity in Big Data, and a Plan to Stop It
Nathan Marz outlines several sources of complexity introduced in data systems - Lack of human fault-tolerance, Conflation of data and queries, Schemas done wrong - and what can be done to avoid them.
-
Erlang's Open Telecom Platform (OTP) Framework
Steve Vinoski introduces Erlang’s OTP Frmework, outlining some of its main features, including several behaviors – implementations of common patterns useful for concurrent fault-tolerant applications.
-
Storm: Distributed and Fault-tolerant Real-time Computation
Nathan Marz discusses Storm concepts –streams, spouts, bolts, topologies-, explaining how to use Storms’ Clojure DSL for real-time stream processing, distributed RPS and continuous computations.
-
Anomaly Detection, Fault Tolerance and Anticipation Patterns
John Allspaw discusses fault tolerance, anomaly detection and anticipation patterns helpful to create highly available and resilient systems.
-
Techniques for Scaling the Netflix API
Daniel Jacobson covers the history of Netflix’s APIs, adaptation for the cloud, development and testing, resiliency, and the future of their APIs.
-
Architecting for Failure at the Guardian.co.uk
Michael Brunton-Spall talks about various types of system failure that can happen, sharing the lessons learned at the Guardian and measures taken to prevent and mitigate failure.
-
Building Highly Available Systems in Erlang
Joe Armstrong discusses highly available (HA) systems, introducing different types of HA systems and data, HA architecture and algorithms, 6 rules of HA, and how HA is done with Erlang.
-
Storm: Distributed and Fault-tolerant Real-time Computation
Nathan Marz explain Storm, a distributed fault-tolerant and real-time computational system currently used by Twitter to keep statistics on user clicks for every URL and domain.
-
Above the Clouds: Introducing Akka
Jonas Bonér introduces Akka, a JVM platform that wants to address the complex problems of concurrency, scalability and fault tolerance using Actors, STM and self-healing from crashes.
-
Things Break, Riak Bends
Justin Sheehy talks about failure and the need to prepare for it, giving some real life examples along with techniques implemented in Riak to make it resilient to faults.
-
Message Passing Concurrency in Erlang
Joe Armstrong explains through Erlang examples that message passage concurrency represents the foundation of scalable fault-tolerant systems.
-
Failure Comes in Flavors - Stability Anti-patterns
Michael Nygard encourages us to have a failure oriented mindset. He presents many anti-patterns leading to systems instability and failure, accompanied by design patterns that should be used instead.