Ariel Tseitlin discusses Netflix' failure-based suite of tools, collectively called the Simian Army, used to improve resiliency and maintain the cloud environment.
Chris Oldwood discusses what it takes to create robust software: correct error detection and recovery, testing systemic effects, app monitoring and configuration.
Zach Tellman discusses instrumenting and analyzing running systems using real world examples from Factual's production systems.
Roy Rapoport discusses how Netflix uses metrics to monitor and manage their operating environment along with some notes about their event management system.
Gary P Russell shows an application used for managing and monitoring apps built with Spring Integration, and overviews the JMX support provided by Spring Integration.
Kevin Lynagh provides the rationale behind visual interfaces, and presents a sample example written in ClojureScript.
Lloyd Dugan discusses using the BPMN visual programming language for designing composite services and service orchestration.
Bhaven Avalani and Yuri Finklestein discuss 4 aspects encountered at eBay when dealing with monitoring data: reduction of data entropy, robust data distribution, metric extraction, efficient storage.
Alex Gosse presents the current trend in application delivery, referring to cloud computing, its adoption and DevOps tools used in such environments.
Ram C Singh discusses using Big Data for infrastructure telemetry along with good practices and an autonomic engine to create an autonomic computing infrastructure that might prevent downtime.
Patrick Debois discusses the current state of monitoring and metrics, how developers and the company can benefit from them, and how to improve the collection of metrics and the monitoring process.