InfoQ Homepage Fault Tolerance Content on InfoQ

Articles

RSS Feed

Architecture & Design

Cell-Based Architecture Adoption Guidelines

The challenges in building modern, reliable, and understandable distributed systems continue to grow, and cell-based architecture is a valuable way to accept, isolate, and stay reliable in the face of failures. Organizations must ensure that the cell-based architecture is the right fit for them and that the migration will not cause more problems than it solves.

Guy Coleman
on Nov 04, 2024
Architecture & Design

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

Cell-based architectures offer a robust approach to building resilient systems. They achieve this through the core principles of isolation, autonomy, and replication. Each cell manages its resources and makes decisions autonomously. Observability for cell-based architecture requires a tailored approach to address the unique challenges and opportunities presented by this distributed system design.

Yury Niño Roa
on Oct 21, 2024
Architecture & Design

Article Series: Cell-Based Architectures: How to Build Scalable and Resilient Systems

In this article series, we take readers on a journey of discovery and provide a comprehensive overview and in-depth analysis of many key aspects of cell-based architectures, as well as practical advice for applying this approach to existing and new architectures.

Rafal Gancarz
on Oct 14, 2024
Java

Implementing Microservicilities with Quarkus and MicroProfile

Microservicilities is a list of cross-cutting concerns that a service must implement apart from the business logic. These concerns include invocation, elasticity and resiliency, among others. This article describes how Quarkus and MicroProfile may be used to implement these concerns.

Alex Soto
on May 13, 2021
DevOps

Designing Chaos Experiments, Running Game Days, and Building a Learning Organization: Chaos Conf Q&A

The second Chaos Conf event is taking place in San Francisco over 25-26 September. In preparation for the conference, InfoQ sat down with a number of the presenters, and discussed topics such as the evolution and adoption of chaos engineering, key people and process learning from running chaos experiments, and what the biggest blockers are for mainstream adoption.

Daniel Bryant
on Aug 31, 2019
Architecture & Design

Resilient Systems in Banking

Resilience is about tolerating failure, not eliminating it. To build a resilient system, you must build a system that absorbs shocks, and continues or recovers. Following best practices for resilient architecture, including established cloud patterns, allowed Starling Bank to build a bank, from scratch, in a year, against a backdrop of highly public outages amongst incumbent banks.

Greg Hawkins
on Oct 06, 2018
DevOps

Service Mesh: Promise or Peril?

Service meshes such as Istio, Linkerd, and Cilium are gaining increased visibility as companies adopt microservice architectures. The arguments for a service mesh are compelling: full-stack observability, transparent security, systems resilience, and more. But is a service mesh really the right solution for you? This article examines when a service mesh makes sense and when it might not.

Richard Li
on Jun 28, 2018
DevOps

Six Tips for Running Scalable Workloads on Kubernetes

Tips to ensure Kubernetes knows what is happening with your deployment: where best to schedule it, when is it ready to serve requests and ensuring work is spread across as many nodes as possible.

Joel Speed
on Mar 30, 2018
Development

A Comparison between Rust and Erlang

This article will focus on a comparison between Erlang and Rust, detailing their similarities and differences. It may be interesting to both Erlang developers looking into Rust and Rust developers looking into Erlang. A final section will detail more about each of the language capabilities and shortcomings and argue for the possibility of leveraging both languages' strengths in the same project.

Krishna Kumar Thokala
on Mar 13, 2018
DevOps

When Streams Fail: Implementing a Resilient Apache Kafka Cluster at Goldman Sachs

At QCon New York, Anton Gorshkov presented “When Streams Fail: Kafka Off the Shore”. The talk shared insight into how a platform team at a large financial institution design and operate shared internal messaging clusters like Apache Kafka, and also how they plan for, and resolve, the inevitable failure that occurs.

Daniel Bryant
on Feb 13, 2018
Architecture & Design

But is it Safe?

While it is rare to hear the question, "Is this software safe?", the safety aspects of software are becoming increasingly important. The proliferation of IoT devices increases the widespread impact a small problem can cause. Several techniques exist to help developers analyze and improve the safety of software they create.

Gary K. Evans
on Oct 31, 2016
Storm Applied Review and Q&A with the Authors

Storm is a distributed, fault-tolerant, real-time computation system that was originally developed at BackType and later open sourced by Twitter. Storm Applied is a new book from Manning that aims to provide a practical guide on using Storm, both in a development and in a production setting. InfoQ has spoken with two of the book’s authors, Sean T. Allen and Matthew Jankowski.

Sergio De Simone
on Jul 27, 2015

Topics

Beyond the Breach: Proactive Defense in the Age of Advanced Threats

Cell-Based Architecture Adoption Guidelines

Launching AI Agents Across Europe at Breakneck Speed With an Agent Computing Platform

Making Digital Accessibility More Than Just High Contrast: Building Truly Inclusive Software

Proactive Approaches to Securing Linux Systems and Engineering Applications

Helpful links

Choose your language

Articles

Cell-Based Architecture Adoption Guidelines

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

Article Series: Cell-Based Architectures: How to Build Scalable and Resilient Systems

Implementing Microservicilities with Quarkus and MicroProfile

Designing Chaos Experiments, Running Game Days, and Building a Learning Organization: Chaos Conf Q&A

Resilient Systems in Banking

Service Mesh: Promise or Peril?

Six Tips for Running Scalable Workloads on Kubernetes

A Comparison between Rust and Erlang

When Streams Fail: Implementing a Resilient Apache Kafka Cluster at Goldman Sachs

But is it Safe?

Storm Applied Review and Q&A with the Authors

Beyond the Breach: Proactive Defense in the Age of Advanced Threats

Steve Klabnik and Herb Sutter Talk about Rust and C++

Challenges and Lessons Porting Code from C to Rust

Grab Employs LLMs for Conversational Data Discovery with GPT-4, Glean and Slack

Cell-Based Architecture Adoption Guidelines

Software Architecture Tracks at QCon San Francisco 2024 – Navigating Current Challenges and Trends

Making Digital Accessibility More Than Just High Contrast: Building Truly Inclusive Software

What Developers Can Do to Continue to Program as They Age

How Rules Can Foster Creativity: The Design System of Reykjavík

Launching AI Agents Across Europe at Breakneck Speed With an Agent Computing Platform

OSI Releases New Definition for Open Source AI, Setting Standards for Transparency and Accessibility

Being a Responsible Developer in the Age of AI Hype

Optimizing Uber's Search Infrastructure: Upgrading to Apache Lucene 9.5

Improving the Efficiency of Goku Time-Series Database at Pinterest

Expedia Migrates a Massive Cassandra Cluster to ScyllaDB with Zero Downtime

QCon San Francisco

QCon London

InfoQ Dev Summit Boston

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Articles