InfoQ Homepage Presentations Canary Analyze All The Things: How We Learned to Keep Calm and Release Often
Canary Analyze All The Things: How We Learned to Keep Calm and Release Often
Summary
Roy Rapoport discusses canary analysis deployment and observability patterns he believes that are generally useful, and talks about the difference between manual and automated canary analysis.
Bio
Roy Rapoport manages the Insight Engineering group at Netflix, responsible for building Netflix's Operational Insight platforms, including cloud telemetry, alerting, and real-time analytics. He originally joined Netflix as part of its datacenter-based IT/Ops group, and prior to transferring over to Product Engineering, was managing Service Delivery for IT/Ops.
About the conference
Software is Changing the World. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.
Community comments
Great concepts !
by Richard Langlois,
Ratios and canary-baseline comparison
by Fotis Stamatelopoulos,
Great concepts !
by Richard Langlois,
Your message is awaiting moderation. Thank you for participating in the discussion.
Roy, I often read about Canary deployment, but this is the first time I hear about how to properly analyze the results. Thanks for sharing your experience, I will try to start implementing those concept in my workplace,
Richard Langlois
Principal Software Engineer, Microsoft, Burlington, MA.
Ratios and canary-baseline comparison
by Fotis Stamatelopoulos,
Your message is awaiting moderation. Thank you for participating in the discussion.
Roy, thank you for sharing details and insight on the canary analysis patterns. More than two years after the original talk, it is still valuable to view the presentation and go through the slides. It would be really useful if you could share some additional details on the canary-baseline comparison and the score calculation: (a) how do you define the ratio shown in the dashboard on slide 50? my initial thought was that you divide the two time series (canary/baseline) and get a metric which reveals similar performance when it is near 1.0. But some values (in the dependency section) are so much larger than 1.0 and still they do not trigger a hot status, so my hypothesis is seriously challenged. (b) could you also provide some more insight on the principles applied when comparing a metric between the canary and the baseline? From the dashboard it seems that you define an upper and lower tolerance boundary (e.g. +/- 20%, or +/-2%) which also challenges my hypothesis on the ratio definition since similar ratio values for different metrics show up as normal or hot. Thanks again for the great presentation and the extremely insightful information shared.
-
Fotis Stamatelopoulos, Software Architect, Upwork Inc.