InfoQ Homepage Infrastructure Content on InfoQ

Articles

RSS Feed

Newer Older

DevOps

Ceph RBD Turns 15: a Story of Open Source Creation

Fifteen years ago, Ceph RBD began as a community-driven idea that grew into essential infrastructure powering today's cloud platforms. This insider story from Yehuda Sadeh-Weinraub reveals how two developers started a distributed storage that now supports OpenStack and Kubernetes through transparent, collaborative development.

Yehuda Sadeh-Weinraub
on Jul 07, 2025
DevOps

Analyzing Apache Kafka Stretch Clusters: WAN Disruptions, Failure Scenarios, and DR Strategies

Proficient in analyzing the dynamics of Apache Kafka Stretch Clusters, I assess WAN disruptions and devise effective Disaster Recovery (DR) strategies. With deep expertise, I ensure high availability and data integrity across multi-region deployments. My insights optimize operational resilience, safeguarding vital services against service level agreement violations.

Srikanth Daggumalli Nishchai Jayanna Manjula
on Jun 20, 2025
Cloud

Designing Resilient Event-Driven Systems at Scale

Learn how to design resilient event-driven systems that scale. Explore key patterns like shuffle sharding and decoupling queues to handle load spikes and failures. Understand common pitfalls like over-relying on retries and neglecting observability for robust, scalable architectures.

Rajesh Kumar Pandey
on May 30, 2025
Development

Binary Size Matters: the Challenges of Fitting Complex Applications in Storage-Constrained Devices

This article explores developing software for microcontrollers in C or C++, where constraints are the limited amount of volatile memory and the embedded hardware platform on which the software runs. It shows how to adopt languages like C++ while optimizing for binary size due to stringent hardware constraints, and trade off between runtime efficiency and binary size in architecture decisions.

Paulo Martinez
on May 16, 2025
Architecture & Design

Legacy Modernization: Architecting Real-Time Systems around a Mainframe

At its heart, our transformation journey is about breaking dependencies at multiple levels. Many enterprises face similar challenges with legacy systems: tightly coupled architectures that are difficult to scale, change, or maintain. For us at National Grid, the solution came through four complementary paradigms that worked together to enable different forms of decoupling.

Jason Roberts Sonia Mathew
on Apr 30, 2025
Development

How to Compute without Looking: a Sneak Peek into Secure Multi-Party Computation

This article shows how you can compute a function across multiple parties that do not trust each other without forcing them to share their individual inputs. This technique can be used to split secrets among parties, perform logical operations, or count votes in a way that ensures data privacy is preserved.

Debasish Ray Chawdhuri
on Mar 31, 2025
AI, ML & Data Engineering

Eclipse LMOS: Launching AI Agents across Europe at Breakneck Speed

In this talk, the authors share some of our company’s key learnings in developing customer-facing LLM-powered applications deployed across Europe. They used multi-agent architecture and systems design to create an open-source set of tools, a framework, and a full-fledged platform to accelerate the development of AI agents. This is a summary of a presentation from InfoQ Dev Summit Boston 2024.

Arun Joseph Patrick Whelan
on Feb 17, 2025
Architecture & Design

Transforming Legacy Healthcare Systems: a Journey to Cloud-Native Architecture

Discover how Livi navigated the complexities of transitioning MJog, a legacy healthcare system, to a cloud-native architecture, sharing valuable insights for successful tech modernization. Our experience illustrates that transitioning from legacy systems to cloud-based microservices is not a one-time project, but an ongoing journey.

Leander Vanderbijl
on Nov 18, 2024
AI, ML & Data Engineering

Efficient Resource Management with Small Language Models (SLMs) in Edge Computing

Small Language Models (SLMs) bring AI inference to the edge without overwhelming the resource-constrained devices. In this article, author Suruchi Shah dives into how SLMs can be used in edge computing applications for learning and adapting to patterns in real-time, reducing the computational burden and making edge devices smarter.

Suruchi Shah
on Nov 11, 2024
Architecture & Design

Cell-Based Architecture Adoption Guidelines

The challenges in building modern, reliable, and understandable distributed systems continue to grow, and cell-based architecture is a valuable way to accept, isolate, and stay reliable in the face of failures. Organizations must ensure that the cell-based architecture is the right fit for them and that the migration will not cause more problems than it solves.

Guy Coleman
on Nov 04, 2024
Architecture & Design

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

Cell-based architectures offer a robust approach to building resilient systems. They achieve this through the core principles of isolation, autonomy, and replication. Each cell manages its resources and makes decisions autonomously. Observability for cell-based architecture requires a tailored approach to address the unique challenges and opportunities presented by this distributed system design.

Yury Niño Roa
on Oct 21, 2024
Cloud

Optimizing Wellhub Autocomplete Service Latency: a Multi-Region Architecture

Every company wants fast, reliable, and low-latency services. Achieving these goals requires significant investment and effort. In this article, I will share how Wellhub invested in a multi-region architecture to achieve a low-latency autocomplete service.

Matheus Felisberto
on Oct 17, 2024

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles