InfoQ Homepage Availability Content on InfoQ
-
Greenwater, Washington: an Availability Story
Marc Brooker discusses defining and designing for availability that takes people into account, including examples of massive-scale cloud systems designed using these principles.
-
Always Available
Claudio Ortolina discusses leveraging Elixir/OTP tools to provide continuous service even when a database is down, walking through the refactoring of an Elixir/Phoenix/PostgreSQL application.
-
Low Latency Trading Architecture at LMAX Exchange
Sam Adams overviews the architecture LMAX Exchange uses to deliver over $2 trillion a year through their platform, and shares their experience building a high-availability stateful system.
-
Adaptive Availability for Quality of Service
Theo Schlossnagle talks about lessons learned in building an always-on distributed time-series database with aggressive quality of service guarantees, and techniques for dealing with bad machines.
-
An Erlang-Based Philosophy for Service Reliability
Jamshid Mahdavi explains how WhatsApp has developed their server components, the deployment processes, and how they monitor, alert, and repair the inevitable failures in a billion-users service.
-
A Brief History of Chain Replication
Christopher Meiklejohn talks through a history of chain replication, starting with the original work from 2004 by van Renesse and Schneider up to new and unique designs of chain replication.
-
Architecting Distributed Databases for Failure
Fangjin Yang covers common problems and failures seen with distributed systems, and discusses design patterns that can be used to maintain data integrity and availability when everything goes wrong.
-
Logging Makes Perfect - Real-world Monitoring and Visualizations
Itamar Syn-Hershko shows using various technologies -Storm, Node.js, Riemann, collectd, D3.js, ELK, PagerDuty, Slack - to power Forter’s service and keep it highly available and under control.
-
How Netflix Leverages Multiple Regions to Increase Availability: An Active-Active Case Study
Ruslan Meshenberg discusses Netflix's challenges, operational tools and best practices needed to provide high availability through multiple regions.
-
How Facebook Scales Big Data Systems
Jeff Johnson introduces Apollo, a hierarchical NoSQL data system meant to deal with Facebook's distributed storage needs.
-
Wix Architecture at Scale
Aviran Mordo introduces Wix's architecture, a highly available eventually consistent system, along with patterns for rendering many websites with a relatively small number of servers.
-
Exploiting Loopholes in CAP
Michael Nygard discusses several loopholes in the CAP theorem that can be used to engineer practical, real-world systems with desirable features.