InfoQ Homepage Distributed Systems Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

QCon London 2026 Announces Tracks: AI Engineering, Building Teams, Tech of Finance, and More

The QCon London 2026 tracks are live: 15 practitioner-curated deep dives on AI adoption, resilient architectures, distributed systems, performance, modern languages, data, security, and Staff+ leadership, rooted in real production lessons.

Artenisa Chatziou
on Nov 03, 2025
Architecture & Design

Airbnb’s Mussel V2: Next-Gen Key Value Storage to Unify Streaming and Bulk Ingestion

Airbnb’s engineering team re-architected its internal key-value storage system, Mussel, to unify streaming and bulk ingestion while simplifying operations, achieving over 100,000 writes per second and sub-25ms read latencies on 100-terabyte tables, while leveraging Kubernetes, Kafka, and a NewSQL backend to improve scalability, reliability, and operational efficiency across its internal services.

Leela Kumili
on Oct 24, 2025
Architecture & Design

How LinkedIn Built Enterprise Multi-Agent AI on Existing Messaging Infrastructure

LinkedIn extended its generative AI application platform to support multi-agent systems by repurposing its existing messaging infrastructure as an orchestration layer. This allowed the company to scale AI agents without building new coordination technology from scratch and achieve global availability while supporting complex multi-step workflows through agent coordination.

Eran Stiller
on Sep 15, 2025
Architecture & Design

LinkedIn Re-Architects Edge-Building System to Support Diverse Inference Workflows

LinkedIn has detailed its re-architected edge-building system, an evolution designed to support diverse inference workflows for delivering fresher and more personalized recommendations to members worldwide. The new architecture addresses growing demands for real-time scalability, cost efficiency, and flexibility across its global platform.

Leela Kumili
on Sep 02, 2025
Cloud

Amazon SQS Fair Queues: a New Approach to Multi-Tenant Resiliency

AWS's new Fair Queues for Amazon SQS revolutionize message handling in multi-tenant systems by mitigating the "noisy neighbor" issue. This feature ensures low message dwell times for quieter tenants without requiring code changes, enhancing both performance and fairness. Developers can effortlessly implement this capability and maintain consistent service quality across applications.

Steef-Jan Wiggers
on Jul 31, 2025
Cloud

Microsoft Azure Enhances Observability with OpenTelemetry Support for Logic Apps and Functions

Microsoft has expanded OpenTelemetry support in Azure Logic Apps and Functions, enhancing observability and interoperability across platforms. This open-source framework enables seamless data generation and correlation, enhancing diagnostics beyond standard telemetry. With streamlined configuration and integration, Azure's offerings aim for standardized observability across cloud services.

Steef-Jan Wiggers
on Jun 23, 2025
Cloud

Temporal on AWS Aims to Ease Building Resilient Distributed Systems

Temporal Technologies, the company that created Temporal, an open-source microservices orchestration platform focused on durable execution, has made Temporal Cloud available on the AWS marketplace. By offering their services via AWS, the company aims to simplify the development of resilient distributed systems for large-scale applications.

Steef-Jan Wiggers
on May 09, 2025
Architecture & Design

Mezzalira at QCon London: Micro-Frontends from Design to Organisational Benefits and Deployments

During his QCon London presentation, Luca Mezzalira, principal architect at AWS, shared his experience in building the ideal micro frontend platform. He disclosed the recipe for determining if micro frontends are right for your company, as well as the core principles of creating the perfect architecture for your use case, and also provided deployment strategies for distributed architectures.

Olimpiu Pop
on Apr 30, 2025
Architecture & Design

Lessons on How to Get Timeouts, Retries and Idempotency Right from Sam Newman at QCon London

At QCon London, Sam Newman - the architect who has attributed the coining of the term microservices, went back to the basics to underline the three critical things to get right when working with distributed systems: timeouts, retries and idempotency. Through the talk, he provided mechanisms allowing distributed systems to be more robust.

Olimpiu Pop
on Apr 09, 2025
Architecture & Design

Dapr Agents: Scalable AI Workflows with LLMs, Kubernetes & Multi-Agent Coordination

Introducing Dapr Agents—a groundbreaking framework for creating scalable AI agents using Large Language Models (LLMs). With robust workflows, multi-agent coordination, and cloud-neutral architecture, it enables enterprises to deploy thousands of resilient agents. Built on Dapr’s proven infrastructure, Dapr Agents ensures reliability and observability in AI-driven applications.

Eran Stiller
on Mar 20, 2025
Architecture & Design

How Monzo Bank Built a Cost-Effective, Unorthodox Backup System to Ensure Resilient Banking

Monzo Bank recently revealed Stand-in, an independent backup system on GCP that ensures essential banking services remain operational during application and AWS infrastructure outages. Unlike traditional backups, it's a minimal stand-alone system that exclusively supports key operations and features a cost-effective design, resulting in 1% of the operational costs of the primary deployment.

Eran Stiller
on Feb 24, 2025
AI, ML & Data Engineering

Distributed Multi-Modal Database Aerospike 8 Brings Support for Real-Time ACID Transactions

Aerospike has announced version 8.0 of its distributed multi-modal database, bringing support for distributed ACID transactions. This enables large-scale online transaction processing (OLTP) applications like banking, e-commerce, inventory management, health care, order processing, and more, says the company.

Sergio De Simone
on Feb 16, 2025
Architecture & Design

Inside Netflix’s Distributed Counter: Scalable, Accurate, and Real-Time Counting at Global Scale

Netflix engineers recently published a deep dive into their Distributed Counter Abstraction, a scalable service designed to track user interactions, feature usage, and business performance metrics with low latency. The system balances performance, accuracy, and cost through configurable counting modes, resilient data aggregation, and a globally distributed architecture.

Eran Stiller
on Dec 10, 2024
Cloud

Improving Distributed System Data Integrity with Amazon S3 Conditional Writes

AWS recently announced support for conditional writing in Amazon S3, allowing users to check for the existence of an object before creating it. This feature helps prevent overwriting existing objects when uploading data, making it easier for applications to manage data.

Steef-Jan Wiggers
on Aug 28, 2024
Architecture & Design

How Amazon Aurora Serverless Manages Resources and Scaling for Fleets of 10K+ Instances

AWS engineers published a paper describing the evolution and latest design of resource management and scaling for the Amazon Aurora Serverless platform. Aurora Serverless uses a combination of components at different levels to create a holistic approach for dynamically scaling and adjusting resources to satisfy the needs of customer workloads.

Rafal Gancarz
on Aug 23, 2024

Newer News

Older News

InfoQ Software Architects' Newsletter

News