Ariel Tseitlin discusses Netflix' suite of tools, collectively called the Simian Army, used to improve resiliency and maintain the cloud environment. The tools simulate failure in order to see how the system reacts to it.
Roy Rapoport discusses how Netflix uses metrics to monitor and manage their operating environment along with some notes about their event management system.
Bhaven Avalani and Yuri Finklestein discuss 4 aspects encountered at eBay when dealing with monitoring data: reduction of data entropy, robust data distribution, metric extraction, efficient storage.
Alex Gosse presents the current trend in application delivery, referring to cloud computing, its adoption and DevOps tools used in such environments.
Ram C Singh discusses using Big Data for infrastructure telemetry along with good practices and an autonomic engine to create an autonomic computing infrastructure that might prevent downtime.
Patrick Debois discusses the current state of monitoring and metrics, how developers and the company can benefit from them, and how to improve the collection of metrics and the monitoring process.
Marc Borbas discusses the importance of Application Performance Monitoring, explaining how it can be done with AMQP.
In this presentation from QCon London 2008, Bertrand Delsart discusses real-time (RT) computing requirements in banking, RT Java history, priority semantics, RT APIs, Determinism and NoHeapRealtimeThreads, RT Garbage Collection, soft vs. hard RT, deadline miss handlers, event-driven requests with deadlines, reducing context switches, and benefits of RT Java and the RT Garbage collector.