InfoQ Homepage Failure Content on InfoQ

News

RSS Feed

Newer Older

Culture & Methods

How to Improve Software Team Performance with Experimentation

According to Terhi Aho, experimentation is a way of thinking that guides action. By experimenting we can develop ways of working without a major change process. It can help software teams to solve problems in small steps, relieve their workload, and foster self-management.

Ben Linders
on Oct 03, 2024
Culture & Methods

A Distributed System is Knowable: an Impossible Thing for Developers

Failure in distributed systems is normal. Distributed systems can provide only two of the three guarantees in consistency, availability, and partition tolerance. According to Kevlin Henney, this limits how much you can know about how a distributed system will behave. He gave a keynote about Six Impossible Things at QCon London 2022 and at QCon Plus May 10-20, 2022.

Ben Linders
on Sep 01, 2022
Architecture & Design

Dealing with Thundering Herd at Braintree

Braintree engineer Anthony Ross explained in a recent article how introducing some random jitter into retry intervals for failed tasks solved a thundering herd issue which was impacting the efficiency of their payment dispute management API.

Sergio De Simone
on May 19, 2022
Cloud

Microsoft Announces Azure Chaos Studio in Public Preview

At the recent Ignite, Microsoft announced the public preview of Azure Chaos Studio, a fully-managed experimentation service to help customers track, measure, and mitigate faults with controlled chaos engineering to improve the resilience of their cloud applications.

Steef-Jan Wiggers
on Nov 10, 2021
Culture & Methods

How a Safe-to-Fail Approach Can Enable Psychological Safety in Teams

Companies can establish a culture of psychological safety among their employees, a culture in which failing is not frowned upon but rather is accepted as something that can happen to anyone. Safe-to-fail should be part of the corporate culture. A shift in the way we envision success can lead to a better understanding of where failure lies and provide courage to overcome our fears.

Ben Linders
on Oct 14, 2021
DevOps

AWS Announces Chaos Engineering as a Service Offering

AWS has announced the upcoming release of their chaos engineering as a service offering. The Fault Injection Service (FIS) will provide fully-managed chaos experiments across a number of AWS services. The service includes pre-built templates that generate disruptions mimicking common real-world events. It can be integrated into CI pipelines via API.

Matt Campbell
on Dec 21, 2020
Java

New LiveRecorder for Java Enables Software Failure Replay

LiveRecorder for Java is a newly released application for software failure replay. It enables developers to record application failures and then replay them in IntelliJ to find the cause of the failure. It helps to reduce the debugging time, especially with intermittent failures.

Johan Janssen
on Aug 31, 2020
DevOps

Cloudflare’s 27 Minutes Outage Explained

Cloudflare recently suffered a partial outage, which lasted for 27 minutes. This outage caused 50% of traffic drop across the network.

Aditya Kulkarni
on Aug 29, 2020
DevOps

Failure Modes and Building Resilient Systems: Adrian Cockcroft at QCon SF

Adrian Cockcroft recently shared his thoughts on how to produce resilient systems that operate successfully in spite of the presence of failures. At the recent QCon San Francisco event, he also shared what he considers are good cloud resilience patterns for building with a continuous resilience mindset.

Matt Campbell
on Dec 18, 2019
DevOps

How Did Things Go Right? Learning More from Incidents at Netflix: Ryan Kitchens at QCon New York

At QCon New York, Ryan Kitchens presented “How Did Things Go Right? Learning More from Incidents”. Key takeaways from the talk included: recovery is better than prevention; an incident occurs when there is a “perfect storm” of events -- there is no root cause; “stop reporting on the nines”, as user happiness is more important; and there is value in learning how things go right.

Daniel Bryant
on Jul 05, 2019
Culture & Methods

How to Grow Teams That Can Fail without Fear: QCon London Q&A

Blameless failure starts with building a culture where failure is acknowledged, shared, investigated, remedied, and prevented, said Emma Button, a DevOps and cloud consultant, at QCon London 2019. Visualising the health and state of your system with CI/CD practices can increase trust and ownership and invite people to help out when things fail.

Ben Linders
on Mar 05, 2019
DevOps

What Resiliency Means at Sportradar

Pablo Jensen, CTO at Sportradar, talked about practices and procedures in place at Sportradar to ensure their systems meet expected resiliency levels, at this year's QCon London conference. Jensen mentioned how reliability is influenced not only by technical concerns but also organizational structure and governance, client support, and requires on-going effort to continuously improve.

Manuel Pais
on Apr 06, 2018
DevOps

Chaos Engineering at Twilio

The Twilio team describes their foray into Chaos Engineering where they use Gremlin to inject failures into their homegrown queuing system shards to test for automated recovery.

Hrishikesh Barua
on Dec 25, 2017
Culture & Methods

How to Measure Continuous Delivery

Stability and throughput are the things that you can measure when adopting continuous delivery practices. These metrics can help you reduce uncertainty, make better decisions about which practices to amplify or dampen, and steer your continuous delivery adoption process in the right direction.

Ben Linders
on Sep 21, 2017
Cloud

Public Preview of Azure IaaS Disaster Recovery Announced

In a recent announcement, Microsoft released details about its public preview for Infrastructure-as-a-Service (IaaS) disaster recovery using Azure Site Recovery (ASR). Using the ASR service, organizations can protect IaaS workloads in one Azure region and have it replicated to a different Azure region within a geographical cluster.

Kent Weare
on Aug 07, 2017

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News