Theo Schlossnagle talks about lessons learned in building an always-on distributed time-series database with aggressive quality of service guarantees, and techniques for dealing with bad machines.
Jamshid Mahdavi explains how WhatsApp has developed their server components, the deployment processes, and how they monitor, alert, and repair the inevitable failures in a billion-users service.
Christopher Meiklejohn talks through a history of chain replication, starting with the original work from 2004 by van Renesse and Schneider up to new and unique designs of chain replication.
Fangjin Yang covers common problems and failures seen with distributed systems, and discusses design patterns that can be used to maintain data integrity and availability when everything goes wrong.
Itamar Syn-Hershko shows using various technologies -Storm, Node.js, Riemann, collectd, D3.js, ELK, PagerDuty, Slack - to power Forter’s service and keep it highly available and under control.
Ruslan Meshenberg discusses Netflix's challenges, operational tools and best practices needed to provide high availability through multiple regions.
Jeff Johnson introduces Apollo, a hierarchical NoSQL data system meant to deal with Facebook's distributed storage needs.
Aviran Mordo introduces Wix's architecture, a highly available eventually consistent system, along with patterns for rendering many websites with a relatively small number of servers.
Michael Nygard discusses several loopholes in the CAP theorem that can be used to engineer practical, real-world systems with desirable features.
Paul Gross explains how Braintree deals with high availability for their Ruby application.
Summly: An Award Winning Mobile App's Journey to the Cloud with Five-9s Availability on a Shoestring Budget
Eugene Ciurana describes the architectural choices, servers configuration, database, and caching systems that enabled Summly to achieve Five-9-Availability with cross-continental deployments.
Attila Narin discusses AWS concepts: Availability Zones, RDS Multi-AZ deployments, SQS and Auto Scaling, Elastic IP, load balancing, DNS, DynamoDB, Amazon S3, etc., and EC2 best practices.