InfoQ Homepage Fault Tolerance Content on InfoQ
-
Monkeys in Lab Coats: Applying Failure Testing Research @Netflix
The authors present how lineage-driven fault injection evolved from a theoretical model into an automated failure testing system that leverages Netflix’s fault injection and tracing infrastructures.
-
Scaling Distributed Systems
Natalia Chechina outlines features of actor and functional programming models, and the reason these models attract so much interest in parallel, concurrent, and scaling world.
-
Distributed Eventually Consistent Computations
Christopher Meiklejohn looks at applying two techniques together, deterministic data flow programming and conflict-free replicated data types, to create highly available and fault-tolerant systems.
-
Distributed Scheduling with Apache Mesos in the Cloud
Diptanu Choudhury discusses the design of Netflix’ distributed scheduler based on Mesos and Titan, focusing on bin packing algorithms, scaling in and out of clusters, fault tolerance, and redundancy.
-
Thinking in a Highly Concurrent, Mostly-functional Language
Francesco Cesarini illustrates how the Erlang way of thinking about problems leads to scalable and fault-tolerant designs, describing 3 ways of clustering Erlang nodes within the server side domain.
-
Tumblr - Bits to Gifs
John Bunting talks about different services Tumblr has built and how their architecture helps them be fault tolerant as they continue to grow.
-
Fault Tolerance 101
Joe Armstrong discusses fault tolerant systems, summarizing the key features of Erlang and showing how they can be used for programming fault-tolerant and scalable systems on multi-core clusters.
-
Fault Tolerance Made Easy
Uwe Friedrichsen discusses implementing resilient software design patterns (code included) and improving those patterns to achieve robustness and becoming a resilient software developer.
-
Fault Tolerance 101
Joe Armstrong discusses how fault tolerance relates to scalability and concurrency, and how Erlang helps build fault-tolerant systems on multi-core clusters.
-
Programming, Only Better
Bodil Stokke keynotes on the FP languages for writing bug free, fault tolerant code that help building simple, concurrent and reusable software.
-
Architecting for High Availability
Attila Narin discusses AWS concepts: Availability Zones, RDS Multi-AZ deployments, SQS and Auto Scaling, Elastic IP, load balancing, DNS, DynamoDB, Amazon S3, etc., and EC2 best practices.
-
Designing Fault Tolerant Distributed Applications
Scott Andreas discussing creating fault tolerant distributed applications, and demoes Ordasity, a framework for building self-organizing systems with services.