BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Contribute

Topics

Choose your language

InfoQ Homepage Fault Tolerance Content on InfoQ

  • Dealing with Thundering Herd at Braintree

    Braintree engineer Anthony Ross explained in a recent article how introducing some random jitter into retry intervals for failed tasks solved a thundering herd issue which was impacting the efficiency of their payment dispute management API.

  • Failsafe 3.2 Released with New Resilience Policies

    Failsafe, a lightweight fault tolerance library for Java 8+, launched the major 3.0 release in November 2021. More recently, Failsafe announced the availability of version 3.2 which introduced new Rate Limiter and Bulkhead policies. Failsafe also integrates with asynchronous code like Java’s CompletableFuture.

  • What's New in MicroProfile 5.0

    Delivered under the auspices of the MicroProfile Working Group and five months after the release of MicroProfile 4.1, the anticipated release of MicroProfile 5.0 was made available to the Java community. This new release features alignment with Jakarta EE 9.1 and updates to all eight community-developed core APIs and one standalone API.

  • Microsoft Announces Azure Chaos Studio in Public Preview

    At the recent Ignite, Microsoft announced the public preview of Azure Chaos Studio, a fully-managed experimentation service to help customers track, measure, and mitigate faults with controlled chaos engineering to improve the resilience of their cloud applications.

  • What's New in MicroProfile 4.0

    Delivered under the newly-formed MicroProfile Working Group, the much anticipated release of MicroProfile 4.0 was made available to the Java community. Features include alignment with Jakarta EE 8 and updates to all APIs. The standalone APIs remain unchanged. MicroProfile 4.0 was delivered with incompatible changes to five of the APIs, namely Config, Fault Tolerance, Health, Metrics and OpenAPI.

  • AWS Announces Chaos Engineering as a Service Offering

    AWS has announced the upcoming release of their chaos engineering as a service offering. The Fault Injection Service (FIS) will provide fully-managed chaos experiments across a number of AWS services. The service includes pre-built templates that generate disruptions mimicking common real-world events. It can be integrated into CI pipelines via API.

  • What's New in MicroProfile 3.3

    The Eclipse Foundation released MicroProfile 3.3 featuring updates to five APIs - Rest Client, Config, Fault Tolerance, Metrics and Health. Other improvements include clarifications and enhancements to specifications and documentation, improved integration among all the MicroProfile APIs, interoperability across different MicroProfile implementations, and a complete set of artifacts for each API.

  • Failsafe 2.0 Released with Composable Resilience Policies

    Failsafe, a zero-dependency Java library for handling failures, has released version 2.0 with support for resilience policy composition and a pluggable architecture that enables custom policy service providers.

  • The Human Side of Microservices

    A microservices architecture is a game changer for team communication, not a purely technical solution. If different teams don’t have stable, direct communication channels, the software they produce will suffer. The five key properties crucial for a successful microservices implementation are zero-configuration, auto-discovery, high redundancy, self-healing, and fault tolerance.

  • Gremlin Releases Application Level Fault Injection (ALFI) Platform for Targeted Chaos Experiments

    Gremlin Inc has released their second product offering in the “Failure-as-a-Service” domain– Application-Level Fault Injection (ALFI). Building upon their initial platform that facilitated engineers in creating and running chaos experiments at the infrastructure level, ALFI enables failure injection at the application level via a native language library.

  • How to Achieve a Resilient Architecture

    To manage systems at scale you must push your system almost to the breaking point, but still be able to recover – and embrace failures, Adrian Hornsby writes in two blog posts sharing his experiences from working with large-scale systems for more than a decade, and the patterns he has found useful.

  • Chaos Engineering at LinkedIn: The “LinkedOut” Failure Injection Testing Framework

    The LinkedIn Engineering team has recently discussed their “LinkedOut” failure injection testing framework. Hypotheses about service resilience can be formulated and failure triggers injected via the LinkedIn LiX A/B testing framework or via data in a cookie that is passed through the call stack using the Invocation Context (IC) framework. Failure scenarios include errors, delays and timeouts.

  • Microservices Resiliency and Fault Tolerance Using Istio and Kubernetes

    Animesh Singh and Tommy Li from IBM spoke at the recent KubeCon + CloudNativeCon North America 2017 Conference about the microservices resiliency and fault tolerance leveraging Istio framework. They also showed how to configure and use circuit breakers and other resiliency features using Istio.

  • Chaos Engineering at Twilio

    The Twilio team describes their foray into Chaos Engineering where they use Gremlin to inject failures into their homegrown queuing system shards to test for automated recovery.

  • What's New in MicroProfile 1.2

    The Eclipse Foundation recently released MicroProfile version 1.2. New APIs added to this release include improved communications among microservices, response to system faults, and the JSON Web Toolkit (JWT). Emily Jiang, CDI and MicroProfile development lead at IBM, and Michael Croft, Java middleware consultant at Payara, spoke to InfoQ about this latest release.

BT