InfoQ Homepage Site Reliability Engineering Content on InfoQ
Presentations
RSS Feed-
The Endgame of SRE
Amy Tobey discusses sociotechnical thinking, exploring ways SREs can impact reliability at scale.
-
Observing and Understanding Failures: SRE Apprentices
Tammy Bryant Butow covers practical lessons learned in the SRE Apprentices program, things she'd change and shares how to create and roll out such a program.
-
The SRE as a Diplomat
Johnny Boursiquot discusses the unintended consequences of certain service ownership and operational models when SRE is seen as an outside unwanted influence, and how to build trust with those teams.
-
Managing Systems in an Age of Dynamic Complexity
Laura Nolan looks at the common architectural shapes of dynamic control planes, and some examples of how they fail. Why are dynamic control planes so hard to run, and what can be done about it?
-
Pitfalls in Measuring SLOs
Danyel Fisher and Liz Fong Jones discuss how they brought the theory of SLOs to practice, and what they learned that they hadn’t expected in the process.
-
Chaos Engineering for People Systems
Dave Rensin shares his experiences building stronger systems, teams, and companies at Google over the last five years.
-
Making a Lion Bulletproof: SRE in Banking
Robin van Zijll and Janna Brummel talk about the history, present and future of ING’s SRE team and practices. They share lessons learned that can be applied to any organization starting or growing SRE
-
Debugging Microservices: How Google SREs Resolve Outages
Adam Mckaig and Liz Fong-Jones talk about how SREs discover and debug problems at Google during outages, and share real stories about their experiences.