Theo Schlossnagle talks about lessons learned in building an always-on distributed time-series database with aggressive quality of service guarantees, and techniques for dealing with bad machines.
Pat Patterson and Ted Malaska talk about current and emerging data processing technologies, and the various ways of achieving "at least once" and "exactly once" timely data processing.
Brandon Philips describes how bringing containers, schedulers, and distributed systems together will create more reliable and greatly more trusted server infrastructures.
Alan Ngai and Premal Shah discuss best practices on monitoring distributed real-time data processing frameworks and how DevOps can gain control and visibility over these data pipelines.
Heidi Howard explores how to construct resilient distributed systems on top of unreliable components. Howard discusses which algorithms are best suited to different situations.
Aysylu Greenberg discusses some of the new architectural patterns from systems she has worked on at Google and the related work that provides insights into the motivations behind them.
Alvaro Videla reviews distributed systems: async/sync, message passing, shared memory, failure detectors, leader election, consensus and different kinds of replication, and recommends related books.
Mathieu Bastian explores the mechanics of unit, integration, data and performance testing for large, complex data workflows, along with the tools for Hadoop, Pig and Spark.
Gian Merlino discusses stream processors and a common use case - keeping databases up to date-, the challenges they present, with examples from Kafka, Storm, Samza, Druid, and others.
Christopher Meiklejohn talks through a history of chain replication, starting with the original work from 2004 by van Renesse and Schneider up to new and unique designs of chain replication.
Diego Ongaro introduces Raft, a consensus algorithm for managing a replicated log by separating the key elements of consensus and reducing the number of states that must be considered.
Fangjin Yang covers common problems and failures seen with distributed systems, and discusses design patterns that can be used to maintain data integrity and availability when everything goes wrong.