InfoQ Homepage Incident Response Content on InfoQ
Podcasts
RSS Feed-
Anurag Gupta on Day 2 Operations, DevOps, and Automated Remediation
In this podcast Anurag Gupta, founder and CEO of Shoreline.io, sat down with InfoQ podcast host Daniel Bryant and discussed: the role of DevOps and site reliability engineering (SRE), day 2 operations, and the importance of building observability into applications and platforms.
-
Ryan Kitchens on Learning from Incidents at Netflix, the Role of SRE, and Sociotechnical Systems
In today’s podcast, we sit down with Ryan Kitchens, a senior site reliability engineer and member of the CORE team at Netflix. This team is responsible for the entire lifecycle of incident management at Netflix, from incident response to memorialising an issue.
Sponsored Content
The Blameless Complete Guide to Incident Management Part 1
You can never fully prevent incidents, so it's important to resolve them as efficiently as possible. This eBook will break down what to do when things go wrong. Download Now.
Bridging the Gap: DevOps to SRE
Enhance your incident management by investing in a powerful toolbox, aligning on SLOs, and creating a just culture. This eBook gives you practical steps to implementing SRE practices. Download Now.
Beyond the 4 SRE Golden Signals
The Four Golden Signals are only the foundation for a more meaningful understanding of system health. In this eBook, we'll examine how to get the most out of the golden signals, and show you how to build beyond them. Download Now.