Josh Evans uses the Netflix Operations Engineering team as a case study to explore the challenges faced by centralized engineering teams and approaches to addressing those challenges.
Michael Brunton-Spall shows how DevOps-like patterns can be applied on microservices to give the development teams more responsibility for their choices, and much more.
Dustin Huptas, Andreas Schmidt present some of the operational challenges met when dealing with microservices, and offer solutions from the field of automation and service discovery.
John Wilkes shares lessons learned managing clusters at the scale of Google.
Robert Benefield offers a pragmatic overview for discovering operational indicators that provide valuable insight in running and improving online services.
Pedro Canahuati describes how Facebook's operations maintains their infrastructure, including challenges faced and lessons learned: prioritizing calls, managing technical debt, incident management.
Ben Christensen describes Netflix API's evolution to a web service platform serving all devices and users, the challenges met in operations, deployment, performance, fault-tolerance, and innovation.
Joe Sondow presents how Netflix uses Asgard to deploy code updates and manage resources in the Amazon cloud.
Roy Rapoport discusses how Netflix uses metrics to monitor and manage their operating environment along with some notes about their event management system.
Filippos Santas explains how to apply service-orientation principles, patterns, processes and SOA governance precepts to ITIL's service lifecycle stages, key processes and activities.
Phil Toland discusses using Erlang and Ruby providing backup for 20k network devices running in 8 datacenters across 3 continents for Rackspace’s operations.
Ram C Singh discusses using Big Data for infrastructure telemetry along with good practices and an autonomic engine to create an autonomic computing infrastructure that might prevent downtime.