Tom Faulhaber discusses the new container-based toolbox for building systems that are robust in the face of failures, how to recover from failure and how the tools can be used to best effect.
Haley Tucker discusses how other systems may affect Netflix' services, strategies to protect their systems and make sure they won't fail even if things go wrong.
Thomissa Comellas shares her experiences developing and rolling out new Disaster Recovery Testing techniques at Dropbox. Tammy Butow shares how her team runs DRTs and has implemented the techniques.
Marcus Frodin discusses a few failures he has overseen at Spotify, deriving a framework of how to think about and evaluate what worked and what didn’t, and how to get more of the things that did.
The panelists discuss some of the unique problems that only data science can solve, the pitfalls and the success rate of data science projects.
Adrian Cockcroft discusses success/failure stories of adopting microservices, overviews what’s next with microservices and presents some of the techniques that have led to successful deployments.
Bruce Haefele shares from the successes and failures implementing an API strategy at Healthdirect Australia.
Alvaro Videla reviews distributed systems: async/sync, message passing, shared memory, failure detectors, leader election, consensus and different kinds of replication, and recommends related books.
Sadek Drobi talks about the prismic.io API and how to understand the properties and the mechanics of a system, and to partition its different dimensions to avoid a domino style failure cascade.
Pete Smith shares from his experience, discussing what it means to fail and how to make the most of it
Tammer Saleh talks about the mistakes made building microservices, when microservices are appropriate, where to draw the lines between services, performance issues, testing, debugging, failure, etc.
Fangjin Yang covers common problems and failures seen with distributed systems, and discusses design patterns that can be used to maintain data integrity and availability when everything goes wrong.