InfoQ Homepage Resilience Content on InfoQ
-
Generative AI and Organizational Resilience
Generative AI will profoundly transform communication and information sharing over the next decade, but the change will be uneven across industries and roles. Organizations should empower workers to use AI augmentation thoughtfully, while building literacy on capabilities and limits. A balanced, conscientious integration, using iterations and customer feedback, will produce the best outcomes.
-
Orchestrating Resilience Building Modern Asynchronous Systems
In this article, we will discuss what problems we had to solve at Twilio to efficiently build a resilient and scalable asynchronous system to handle a complex workflow and the advantages we got from adopting a Workflow Orchestration solution, including abstracting away state management and out-of-the-box support for retries, observability, and audibility.
-
The Incident Lifecycle: How a Culture of Resilience Can Help You Accomplish Your Goals
Don’t get stuck with overwhelmed systems that can cause an outage, like what happened with Taylor Swift concert tickets. Build organizational resilience to incidents through improved coordination and communication during the response, and blameless reviews, root cause analysis, and insightful communication afterward to enable meaningful change.
-
Write More, Talk Less: Building Organizational Resilience through Documentation and InnerSource
Better documentation and knowledge sharing creates transparency that aids onboarding, prevents turnover disruption, and withstands reorganizations. Different practices can help, such as communicating asynchronously, creating incentives for documentation, making docs discoverable, understanding team members' preferences, and providing dedicated writing time. And maybe InnerSource can help too.
-
Debugging Production: eBPF Chaos
This article shares insights into learning eBPF as a new cloud-native technology which aims to improve Observability and Security workflows. You’ll learn how chaos engineering can help, and get an insight into eBPF based observability and security use cases. Breaking them in a professional way also inspires new ideas for chaos engineering itself.
-
How We Improved Application’s Resiliency by Uncovering Our Hidden Issues Using Chaos Testing
This article lists the chaos testing principles which are outlined by Netflix. The readers should be able to understand the advantages and disadvantages that chaos testing offers. This will help them to decide whether they want to perform it or not. The article also explains why we should convince the management to perform chaos tests, considering all benefits over the risks.
-
How Do We Utilize Chaos Engineering to Become Better Cloud-Native Engineers?
Engineers these days are closer to the product and the customer needs—there is still a long way to go and companies are still struggling with how to get engineers closer to their customers to understand in-depth what their business impact is: what do they solve, what’s their influence on the customer, and what is their impact on the product?
-
Chaos Engineering and Observability with Visual Metaphors
This article introduces a new actor for visualising chaos engineering and observability: metaphors. It provides the conceptual foundations of chaos engineering and observability, presents a state of art of visualisation techniques available in the market and shows how treemaps, gauge charts, geocentric and city metaphors can enrich the spectrum of the visual strategies to observe the chaos.
-
DevOps and Cloud InfoQ Trends Report - July 2021
This article summarizes how we see the "cloud computing and DevOps" space in 2021, which focuses on fundamental infrastructure and operational patterns, the realization of patterns in technology frameworks, and the design processes and skills that a software architect or engineer must cultivate.
-
Building Reliable Software Systems with Chaos Engineering
Advances in large-scale, distributed software systems are changing the game for software engineering. As an industry, we are quick to adopt practices that improve flexibility and improve feature velocity. If we can move quickly, can we do so without breaking things? Chaos Engineering practices can be used to navigate complexity and build more reliable systems.
-
Continuous Learning as a Tool for Adaptation
The fifth and capstone article in a series on how software companies adapted and continue to adapt to enhance their resilience explores key themes with a special view on the practicality of organizational resilience. It also provides practical guidance to engineering leadership and recommendations on how to create this investment.
-
Software Architecture and Design InfoQ Trends Report—April 2021
An overview of how the InfoQ editorial team sees the Software Architecture and Design topic evolving in 2021, with a focus on what architects are designing for today.