BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Fault Tolerance Content on InfoQ

  • AWS Simplifies Multi-Region Failover with ARC Region Switch

    AWS's Amazon Application Recovery Controller Region Switch revolutionizes multi-region failover with a fully-managed, centralized solution. Simplifying disaster recovery, it automates and coordinates essential tasks across AWS services. With proactive validation and a global dashboard, it transforms complex processes into confident, push-button drills, enhancing reliability and cost efficiency.

  • Amazon VPC Route Server Generally Available, Providing Routing Flexibility and Fault Tolerance

    AWS has recently announced the general availability of Amazon VPC Route Server. This new option simplifies dynamic routing in a VPC, allowing developers to advertise routing information via Border Gateway Protocol (BGP) from virtual appliances and dynamically update the VPC route tables associated with subnets and internet gateways.

  • How Google Does Chaos Testing to Improve Spanner's Reliability

    To ensure their Spanner database keeps working reliably, Google engineers use chaos testing to inject faults into production-like instances and stress the system's ability to behave in a correct way in the face of unexpected failures.

  • Decathlon Adopts Backend for Frontend (BFF) Pattern to Empower FE Teams

    Decathlon established the Backend For Frontend (BFF) architectural pattern as a company-wide recommendation and provided guidelines for its adoption among engineering teams. The four-part series introduces the pattern and explores its benefits and potential pitfalls. The company also shares available alternatives to using the BFF pattern and reviews architectural considerations.

  • Grab Improves Kafka on Kubernetes Fault Tolerance with Strimzi, AWS AddOns and EBS

    Grab updated its Kafka on Kubernetes setup to improve fault tolerance and completely eliminate human intervention in case of unexpected Kafka broker terminations. To address the shortcomings of the initial design, the team integrated with AWS Node Termination Handler (NTH), used the Load Balancer Controller for target group mapping, and switched to ELB volumes for storage.

  • Using Code Instrumentation for Fault Injection at the Application Level at eBay

    eBay engineers have been using fault injections techniques to improve the reliability of the notification platform and explore its weaknesses. While fault injection is a common industry practice, eBay attempted a novel approach leveraging instrumentation to bring fault injection within the application level.

  • Atlassian Exceeds 99.9999% of Availability Using Sidecars and Highly Fault-Tolerant Design

    Atlassian recently published how it exceeded 99.9999% of availability with its Tenant Context Service. Atlassian achieved this high availability by implementing highly-autonomous client sidecars, able to proactively shield themselves from complete AWS region failures. Sidecars query multiple services concurrently to accomplish this goal and ensure that requests are entirely isolated internally.

  • Dealing with Thundering Herd at Braintree

    Braintree engineer Anthony Ross explained in a recent article how introducing some random jitter into retry intervals for failed tasks solved a thundering herd issue which was impacting the efficiency of their payment dispute management API.

  • Failsafe 3.2 Released with New Resilience Policies

    Failsafe, a lightweight fault tolerance library for Java 8+, launched the major 3.0 release in November 2021. More recently, Failsafe announced the availability of version 3.2 which introduced new Rate Limiter and Bulkhead policies. Failsafe also integrates with asynchronous code like Java’s CompletableFuture.

  • What's New in MicroProfile 5.0

    Delivered under the auspices of the MicroProfile Working Group and five months after the release of MicroProfile 4.1, the anticipated release of MicroProfile 5.0 was made available to the Java community. This new release features alignment with Jakarta EE 9.1 and updates to all eight community-developed core APIs and one standalone API.

  • Microsoft Announces Azure Chaos Studio in Public Preview

    At the recent Ignite, Microsoft announced the public preview of Azure Chaos Studio, a fully-managed experimentation service to help customers track, measure, and mitigate faults with controlled chaos engineering to improve the resilience of their cloud applications.

  • What's New in MicroProfile 4.0

    Delivered under the newly-formed MicroProfile Working Group, the much anticipated release of MicroProfile 4.0 was made available to the Java community. Features include alignment with Jakarta EE 8 and updates to all APIs. The standalone APIs remain unchanged. MicroProfile 4.0 was delivered with incompatible changes to five of the APIs, namely Config, Fault Tolerance, Health, Metrics and OpenAPI.

  • AWS Announces Chaos Engineering as a Service Offering

    AWS has announced the upcoming release of their chaos engineering as a service offering. The Fault Injection Service (FIS) will provide fully-managed chaos experiments across a number of AWS services. The service includes pre-built templates that generate disruptions mimicking common real-world events. It can be integrated into CI pipelines via API.

  • What's New in MicroProfile 3.3

    The Eclipse Foundation released MicroProfile 3.3 featuring updates to five APIs - Rest Client, Config, Fault Tolerance, Metrics and Health. Other improvements include clarifications and enhancements to specifications and documentation, improved integration among all the MicroProfile APIs, interoperability across different MicroProfile implementations, and a complete set of artifacts for each API.

  • Failsafe 2.0 Released with Composable Resilience Policies

    Failsafe, a zero-dependency Java library for handling failures, has released version 2.0 with support for resilience policy composition and a pluggable architecture that enables custom policy service providers.

BT