Roy Rapoport discusses how Netflix uses metrics to monitor and manage their operating environment along with some notes about their event management system.
Kathleen Ting details 8 misconfigurations that can bring ZooKeeper down.
Filippos Santas explains how to apply service-orientation principles, patterns, processes and SOA governance precepts to ITIL's service lifecycle stages, key processes and activities.
Koa McCullough presents best practices for running Percona Server and MySQL in the cloud, cloud backups using EBS, Xtrabackup and S3, using Percona Toolkit to simplify operations, and XtraDB Cluster.
Craig Kerstiens presents the history of Postgres, the basics of developing with Postgres, notes on its performance, and tips on querying it.
Chris Pinkham explains how to create an automated scalable self-service infrastructure based on principles used by Amazon to build their cloud services.
Bryan O'Sullivan introduces some of the technologies pioneered in the Haskell community to streamline software development and reduce operational costs, while producing beautiful code.
Gareth Rushgrove offers advice, code samples, and introduces tools - Puppet, Chef and CloudFormation – helpful for automating every infrastructure operations.
Nathan Marz outlines several sources of complexity introduced in data systems - Lack of human fault-tolerance, Conﬂation of data and queries, Schemas done wrong - and what can be done to avoid them.
Phil Toland discusses using Erlang and Ruby providing backup for 20k network devices running in 8 datacenters across 3 continents for Rackspace’s operations.
Ram C Singh discusses using Big Data for infrastructure telemetry along with good practices and an autonomic engine to create an autonomic computing infrastructure that might prevent downtime.