InfoQ Homepage Resilience Content on InfoQ
-
Lessons Learned from the CrowdStrike Incident: InfoQ Dev Summit Munich 2024 Preview
In this podcast episode, speakers from the InfoQ Dev Summit Munich 2024 discuss the recent CrowdStrike incident, which triggered widespread outages and highlighted vulnerabilities in cloud infrastructure. The panel shares personal experiences and emphasizes the implications of cloud dependency, and the lessons learned about risk management and automation in organizations.
-
Courtney Nash Discusses Incident Management, Automation, and the VOID Report
In this episode, Courtney Nash, a researcher focused on system safety and failures in complex sociotechnical systems, discussed the latest edition of the VOID report. Topics covered included: incident management and the role of automation, working effectively within socio-technical systems, and the value of collecting and analyzing system metrics in the good times and the bad.
-
Ana Medina on Chaos Engineering, Game Days, and Learning
Topics discussed included: how enterprise organisations are adopting chaos engineering with the requirements for guardrails and the need for “status checks” to ensure pre-experiment system health; how to run game days or IT fire drills when everyone is working remotely; and why teams should continually invest in learning from past incidents and preparing for inevitable failures within systems.
-
Software Architecture and Design InfoQ Trends Report 2021
Here is an overview of how the InfoQ editorial team sees the Software Architecture and Design topic evolving in 2021, with a focus on what architects are designing for today.
-
The InfoQ Podcast: Software Architecture and Design InfoQ Trends Report—April 2021
An overview of how the InfoQ editorial team sees the Software Architecture and Design topic evolving in 2021, with a focus on what architects are designing for today.
-
How Blameless Culture Transforms Engineering Teams
In this podcast Shane Hastie, Lead Editor for Culture & Methods spoke to Tameem Hourani about building a blameless engineering culture through radical transparency, focusing on system resilience over individual blame, and creating high-performing teams that can embrace change and learn from failures.
-
The Evolution of Code Review: From Bug-Finding to Team Building
In this podcast, Shane Hastie, Lead Editor for Culture & Methods, spoke to Greg Foster about the evolution and purpose of code reviews, building teams with kindness, expertise, and urgency, and how AI tools are changing software development.
-
Building a Resilient and Inclusive Engineering Culture with Matthew Card
In this podcast, Shane Hastie, Lead Editor for Culture & Methods, spoke to Matthew Card about his resilience framework (CAPSS - Confidence, Adaptability, Purpose, Social Support) which has helped him overcome career challenges and now guides him in building inclusive engineering cultures by empowering teams and breaking echo chambers.
-
Resilience, Observability and Unintended Consequences of Automation
In this podcast, Shane Hastie, the Lead Editor for Culture & Methods, spoke to Courtney Nash about her research on the unintended consequences of automation in software systems, the importance of learning from incidents, and maintaining human expertise in complex systems.
-
Resilience and Incident Management with Vanessa Huerta Granda
In this podcast Shane Hastie, Lead Editor for Culture & Methods spoke to Vanessa Huerta Granda Manager of Resiliency Engineering at Enova about resilience and incident management.