InfoQ Homepage Resilience Content on InfoQ
-
Lessons Learned from the CrowdStrike Incident: InfoQ Dev Summit Munich 2024 Preview
In this podcast episode, speakers from the InfoQ Dev Summit Munich 2024 discuss the recent CrowdStrike incident, which triggered widespread outages and highlighted vulnerabilities in cloud infrastructure. The panel shares personal experiences and emphasizes the implications of cloud dependency, and the lessons learned about risk management and automation in organizations.
-
Courtney Nash Discusses Incident Management, Automation, and the VOID Report
In this episode, Courtney Nash, a researcher focused on system safety and failures in complex sociotechnical systems, discussed the latest edition of the VOID report. Topics covered included: incident management and the role of automation, working effectively within socio-technical systems, and the value of collecting and analyzing system metrics in the good times and the bad.
-
Ana Medina on Chaos Engineering, Game Days, and Learning
Topics discussed included: how enterprise organisations are adopting chaos engineering with the requirements for guardrails and the need for “status checks” to ensure pre-experiment system health; how to run game days or IT fire drills when everyone is working remotely; and why teams should continually invest in learning from past incidents and preparing for inevitable failures within systems.
-
Software Architecture and Design InfoQ Trends Report 2021
Here is an overview of how the InfoQ editorial team sees the Software Architecture and Design topic evolving in 2021, with a focus on what architects are designing for today.
-
The InfoQ Podcast: Software Architecture and Design InfoQ Trends Report—April 2021
An overview of how the InfoQ editorial team sees the Software Architecture and Design topic evolving in 2021, with a focus on what architects are designing for today.
-
Resilience, Observability and Unintended Consequences of Automation
In this podcast, Shane Hastie, the Lead Editor for Culture & Methods, spoke to Courtney Nash about her research on the unintended consequences of automation in software systems, the importance of learning from incidents, and maintaining human expertise in complex systems.
-
Resilience and Incident Management with Vanessa Huerta Granda
In this podcast Shane Hastie, Lead Editor for Culture & Methods spoke to Vanessa Huerta Granda Manager of Resiliency Engineering at Enova about resilience and incident management.
-
Exploring the Impact of Generative AI on Software Engineering and Career Paths
In this podcast Shane Hastie, Lead Editor for Culture & Methods spoke to Alex Cruikshank, the Director of Software Engineering at West Monroe.
-
Building Organizational Resilience through Documentation and InnerSource Practices
In this podcast Shane Hastie, Lead Editor for Culture & Methods spoke to David Grizzanti, a principal engineer at the New York Times, about the importance of documentation for organizational resilience, the concept of InnerSource, the parallels between engineering and art, and the challenges and advice for engineering leaders
-
Crisis Management, Black Swans and Resilience
In this podcast Shane Hastie, Lead Editor for Culture & Methods spoke to Sharon Robson about crisis management and business resilience, particularly in the context of technology and software supply chains.