InfoQ Homepage Distributed Systems Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

Netflix Serves 84% of Query Results from Cache with Interval-Aware Caching in Apache Druid

Netflix improves Apache Druid performance with interval aware caching, serving 84% of analytics results from cache and reducing query load by 33%. The system decomposes rolling window queries into reusable time segments, enabling partial cache reuse and recomputation only for recent data. At scale, it reduces scan volume, improves P90 latency, and optimizes real time analytics workloads.

Leela Kumili
on May 11, 2026
Architecture & Design

OpenAI Introduces Websocket-Based Execution Mode to Reduce Latency in Agentic Workflows

OpenAI introduces a WebSocket-based execution mode for its Responses API to improve agentic workflow performance in coding agents and real-time AI systems. The update reduces latency by up to 40 percent by replacing HTTP request-response cycles with persistent connections, improving streaming, tool execution, and multi-step orchestration in production-scale AI systems.

Leela Kumili
on May 07, 2026
Architecture & Design

Designing Memory for AI Agents: inside Linkedin’s Cognitive Memory Agent

LinkedIn introduces Cognitive Memory Agent (CMA), generative AI infrastructure layer enabling stateful, context-aware systems. It provides persistent memory across episodic, semantic, and procedural layers, supporting multi-agent coordination, retrieval, and lifecycle management. CMA addresses LLM statelessness and enables production-grade personalization and long-term context in AI applications.

Leela Kumili
on Apr 20, 2026
Architecture & Design

Pinterest Reduces Spark OOM Failures by 96% through Auto Memory Retries

Pinterest Engineering cut Apache Spark out-of-memory failures by 96% using improved observability, configuration tuning, and automatic memory retries. Staged rollout, dashboards, and proactive memory adjustments stabilized data pipelines, reduced manual intervention, and lowered operational overhead across tens of thousands of daily jobs.

Leela Kumili
on Apr 06, 2026
Development

Discord Engineers Add Distributed Tracing to Elixir's Actor Model without Performance Penalty

Discord engineering detailed how they added distributed tracing to Elixir's actor model. Their custom Transport library wraps messages with trace context and uses dynamic sampling to handle million-user fanouts. CPU optimizations included skipping unsampled traces and filtering context before deserialization, recovering 10+ percentage points of overhead.

Steef-Jan Wiggers
on Mar 28, 2026
Architecture & Design

Inside Agoda’s Storefront: a Latency-Aware Reverse Proxy for Improving DNS Based Load Distribution

Agoda engineers developed Storefront, a Rust-based S3-compatible reverse proxy that improves load balancing, request routing, and observability across large-scale object storage systems. The proxy addresses DNS-based distribution limitations, implements latency-aware routing, cross-data-center optimizations, IO safeguards, credential-less authentication, and exposes telemetry via OpenTelemetry.

Leela Kumili
on Mar 27, 2026
Architecture & Design

Inside Netflix’s Graph Abstraction: Handling 650TB of Graph Data in Milliseconds Globally

Netflix engineers built Graph Abstraction, a high-throughput platform managing 650 TB of graph data with millisecond latency. Supporting services from Netflix Gaming’s social graphs to operational topology graphs, it maintains global availability via asynchronous replication. This article covers its architecture, caching, and traversal design for high-scale performance.

Leela Kumili
on Mar 23, 2026
Architecture & Design

From Minutes to Seconds: Uber Boosts MySQL Cluster Uptime with Consensus Architecture

Uber redesigned its MySQL fleet using a consensus-driven architecture based on MySQL Group Replication, reducing cluster failover time from minutes to seconds. By moving leader election and failure detection into the database layer, Uber improved availability, simplified external orchestration, and strengthened consistency across thousands of production clusters.

Leela Kumili
on Mar 11, 2026
Architecture & Design

Hybrid Cloud Data at Uber: How Engineers Solved Extreme-Scale Replication Challenges

Uber’s HiveSync team optimized Hadoop Distcp to handle multi-petabyte replication across hybrid cloud and on-premise data lakes. Enhancements include task parallelization, Uber jobs for small transfers, and improved observability, enabling 5x replication capacity and seamless on-premise-to-cloud migration.

Leela Kumili
on Mar 02, 2026
Architecture & Design

Uforwarder: Uber’s Scalable Kafka Consumer Proxy for Efficient Event-Driven Microservices

Uber has open-sourced uForwarder, a push-based Kafka consumer proxy built to handle trillions of messages and multiple petabytes of data daily. The system introduces context-aware routing, head-of-line blocking mitigation, adaptive auto-rebalancing, and partition-level delay processing to improve scalability, workload isolation, and hardware efficiency in large-scale event-driven microservices.

Leela Kumili
on Feb 23, 2026
Architecture & Design

How Dropbox Built a Scalable Context Engine for Enterprise Knowledge Search

Dropbox engineers have detailed how the company built the context engine behind Dropbox Dash, revealing a shift toward index-based retrieval, knowledge graph-derived context, and continuous evaluation to support enterprise AI at scale.

Matt Foster
on Feb 18, 2026
Architecture & Design

Uber and OpenAI Retool Rate Limiting Systems

Uber and OpenAI are replacing static rate limits with adaptive, infrastructure-level platforms. Uber’s Global Rate Limiter utilizes probabilistic shedding to manage 80M RPS, while OpenAI’s Access Engine implements a credit waterfall to prevent user interruptions. Both architectures utilize distributed enforcement and soft controls to maintain system stability and service continuity at scale.

Patrick Farry
on Feb 17, 2026
Architecture & Design

GitHub Reworks Layered Defenses after Legacy Protections Block Legitimate Traffic

GitHub engineers recently traced user reports of unexpected “Too Many Requests” errors to abuse-mitigation rules that had accidentally remained active long after the incidents that prompted them.

Matt Foster
on Feb 04, 2026
Development

Cloudflare Open Sources tokio‑quiche, Promising Easier QUIC and HTTP/3 in Rust

Cloudflare has open-sourced tokio-quiche, an asynchronous QUIC and HTTP/3 Rust library that wraps its battle-tested quiche implementation with the Tokio runtime to simplify the development of high-performance QUIC applications. The library was used internally to back the edge services, the Oxy HTTP proxies or MASQUE-based tunnels replacing the Wireguard-based tunnels in the WARP client.

Olimpiu Pop
on Dec 27, 2025
Architecture & Design

Benchmarking beyond the Application Layer: How Uber Evaluates Infrastructure Changes and Cloud Skus

Uber’s Ceilometer framework automates infrastructure performance benchmarking beyond applications. It standardizes testing across servers, workloads, and cloud SKUs, helping teams validate changes, identify regressions, and optimize resources. Future plans include AI integration, anomaly detection, and continuous validation.

Leela Kumili
on Dec 26, 2025

Newer News

Older News

InfoQ Software Architects' Newsletter

News