BT

InfoQ Homepage Fault Tolerance Content on InfoQ

Articles

RSS Feed
  • Resilient Systems in Banking

    Resilience is about tolerating failure, not eliminating it. To build a resilient system, you must build a system that absorbs shocks, and continues or recovers. Following best practices for resilient architecture, including established cloud patterns, allowed Starling Bank to build a bank, from scratch, in a year, against a backdrop of highly public outages amongst incumbent banks.

  • Service Mesh: Promise or Peril?

    Service meshes such as Istio, Linkerd, and Cilium are gaining increased visibility as companies adopt microservice architectures. The arguments for a service mesh are compelling: full-stack observability, transparent security, systems resilience, and more. But is a service mesh really the right solution for you? This article examines when a service mesh makes sense and when it might not.

  • Six Tips for Running Scalable Workloads on Kubernetes

    Tips to ensure Kubernetes knows what is happening with your deployment: where best to schedule it, when is it ready to serve requests and ensuring work is spread across as many nodes as possible.

  • A Comparison between Rust and Erlang

    This article will focus on a comparison between Erlang and Rust, detailing their similarities and differences. It may be interesting to both Erlang developers looking into Rust and Rust developers looking into Erlang. A final section will detail more about each of the language capabilities and shortcomings and argue for the possibility of leveraging both languages' strengths in the same project.

  • When Streams Fail: Implementing a Resilient Apache Kafka Cluster at Goldman Sachs

    At QCon New York, Anton Gorshkov presented “When Streams Fail: Kafka Off the Shore”. The talk shared insight into how a platform team at a large financial institution design and operate shared internal messaging clusters like Apache Kafka, and also how they plan for, and resolve, the inevitable failure that occurs.

  • But is it Safe?

    While it is rare to hear the question, "Is this software safe?", the safety aspects of software are becoming increasingly important. The proliferation of IoT devices increases the widespread impact a small problem can cause. Several techniques exist to help developers analyze and improve the safety of software they create.

  • Storm Applied Review and Q&A with the Authors

    Storm is a distributed, fault-tolerant, real-time computation system that was originally developed at BackType and later open sourced by Twitter. Storm Applied is a new book from Manning that aims to provide a practical guide on using Storm, both in a development and in a production setting. InfoQ has spoken with two of the book’s authors, Sean T. Allen and Matthew Jankowski.

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.