InfoQ Homepage Resilience Content on InfoQ
-
Resilience in Supply Chain Security
Dan Lorenc goes over real-world threats facing open source supply-chains today, and what can be done to architect resilient build and delivery pipelines.
-
Panel: Observability and Understandability
Jason Yee, John Egan, and Ben Sigelman discuss their approaches and preferred methods to get impactful results in incident management, distributed tracing, and chaos engineering.
-
Incident Analysis: Your Organization's Secret Weapon
Nora Jones discusses how to move faster and focus on the things that matter by using incident analysis.
-
More More More! Why the Most Resilient Companies Want More Incidents
John Egan discusses how companies of any scale can improve their understandability by lowering their barriers to incident reporting and simplifying their processes for documenting postmortems.
-
Complex Systems: Microservices and Humans
Katharina Probst discusses some of the best practices to build, evolve, and operate microservices, learnings from containers, service meshes, DevOps, Chaos & load testing, and planning for growth.
-
Building Reliability One Step at a Time
Ana Margarita Medina shares how she has been using Chaos Engineering and how it can be used to decouple our system’s weak points, learn from incidents and improve monitoring and observability.
-
A Sticky Situation: How Netflix Gains Confidence in Changes
Haley Tucker discusses sticky canaries, what they are and how they can help, and how to build confidence in changes.
-
Scaling Culture of Resiliency in the Enterprise
Nate Vogel shares how he grew the data engineering team with an emphasis on building a culture of reliability, discussing processes and tools used.
-
IBM’s Principles of Chaos Engineering
Haytham Elkhoja discusses the process of getting engineers from across to agree on a list of Chaos Engineering principles, adapting existing principles to customer requirements and internal services.
-
Self-Service Chaos Engineering: Fitting Gremlin into a DevOps Culture
Doug Campbell shares how they rolled out Gremlin at Grubhub and how they educated and enabled all engineering teams to use it.
-
Continuous Resilience
Adrian Cockcroft talks about how to build robust systems by being more systematic about hazard analysis, and including the operator experience in the hazard model.
-
Certainty among the Chaos
Marco Coulter discusses the capabilities of chaos engineering beyond resiliency to support capacity optimization.