InfoQ Homepage Optimization Content on InfoQ
-
Netflix Serves 84% of Query Results from Cache with Interval-Aware Caching in Apache Druid
Netflix improves Apache Druid performance with interval aware caching, serving 84% of analytics results from cache and reducing query load by 33%. The system decomposes rolling window queries into reusable time segments, enabling partial cache reuse and recomputation only for recent data. At scale, it reduces scan volume, improves P90 latency, and optimizes real time analytics workloads.
-
OpenAI Introduces Websocket-Based Execution Mode to Reduce Latency in Agentic Workflows
OpenAI introduces a WebSocket-based execution mode for its Responses API to improve agentic workflow performance in coding agents and real-time AI systems. The update reduces latency by up to 40 percent by replacing HTTP request-response cycles with persistent connections, improving streaming, tool execution, and multi-step orchestration in production-scale AI systems.
-
Cloudflare Builds High-Performance Infrastructure for Running LLMs
Cloudflare has recently announced new infrastructure designed to run large AI language models across its global network. As these models rely on costly hardware and must handle large volumes of incoming and outgoing text, Cloudflare separates the model's input processing and output generation onto different optimized systems.
-
Dropbox Collaborates with GitHub to Reduce Monorepo Size from 87GB to 20GB
Dropbox reduced its backend monorepo from 87GB to 20GB by optimizing Git delta compression in collaboration with GitHub. The changes improved clone times, CI performance, and developer velocity, highlighting how repository storage inefficiencies can impact large-scale engineering workflows.
-
Cloudflare Launches Code Mode MCP Server to Optimize Token Usage for AI Agents
Cloudflare has launched a new Model Context Protocol (MCP) server powered by Code Mode, enabling AI agents to interact with large APIs with minimal token usage. The server reduces context footprint across 2,500+ endpoints, improves multi-API orchestration, and provides a secure, code-centric execution environment for LLM agents.
-
Google’s TurboQuant Compression May Support Faster Inference, Same Accuracy on Less Capable Hardware
Google Research unveiled TurboQuant, a novel quantization algorithm that compresses large language models’ Key-Value caches by up to 6x. With 3.5-bit compression, near-zero accuracy loss, and no retraining needed, it allows developers to run massive context windows on significantly more modest hardware than previously required. Early community benchmarks confirm significant efficiency gains.
-
Pinterest Reduces Spark OOM Failures by 96% through Auto Memory Retries
Pinterest Engineering cut Apache Spark out-of-memory failures by 96% using improved observability, configuration tuning, and automatic memory retries. Staged rollout, dashboards, and proactive memory adjustments stabilized data pipelines, reduced manual intervention, and lowered operational overhead across tens of thousands of daily jobs.
-
Inside Agoda’s Storefront: a Latency-Aware Reverse Proxy for Improving DNS Based Load Distribution
Agoda engineers developed Storefront, a Rust-based S3-compatible reverse proxy that improves load balancing, request routing, and observability across large-scale object storage systems. The proxy addresses DNS-based distribution limitations, implements latency-aware routing, cross-data-center optimizations, IO safeguards, credential-less authentication, and exposes telemetry via OpenTelemetry.
-
How Datadog Cut the Size of Its Agent Go Binaries by 77%
After the Datadog Agent grew from 428 MiB to 1.22 GiB over a period of 5 years, Datadog engineers set out to reduce its binary size. They discovered that most Go binary bloat comes from hidden dependencies, disabled linker optimizations, and subtle behaviors in the Go compiler and linker.
-
Python Workers Redux: Wasm Snapshots and Native uv Tooling
Cloudflare's latest advancements in Python Workers revolutionize serverless performance with near-instant cold starts, expanded package compatibility, and streamlined workflows via the uv package manager. By leveraging memory snapshots and WebAssembly, Cloudflare drastically reduces startup times, making Python a prime choice for AI and data science applications.
-
Meta's Optimization Platform Ax 1.0 Streamlines LLM and System Optimization
Now stable, Ax is an open-source platform from Meta designed to help researchers and engineers apply machine learning to complex, resource-intensive experimentation. Over the past several years, Meta has used Ax to improve AI models, accelerate machine learning research, tune production infrastructure, and more.
-
Inside Uber’s Query Architecture: Simplifying Layers and Improving Observability
Uber rebuilt its Apache Pinot query architecture, replacing the Presto-based Neutrino system with a lightweight proxy called Cellar and Pinot’s Multi-Stage Engine Lite Mode. The redesign simplifies SQL execution, improves resource management, and ensures predictable performance for large-scale analytics workloads.
-
Meta Open Sources OpenZL: a Universal Compression Framework for Structured Data
Meta’s OpenZL changes the way data is compressed by maximizing efficiency for structured datasets, outperforming traditional methods like Zstandard. With a universal decompressor and custom compression plans, it simplifies operational deployment while achieving superior compression ratios and speeds, making it an essential tool for modern data infrastructures.
-
Cloudflare Achieves 99.99% Warm Start Rate for Workers with 'Shard and Conquer' Consistent Hashing
Cloudflare's innovative "Shard and Conquer" technique revolutionizes its serverless platform by slashing cold start rates by 90%. Utilizing a consistent hash ring, it routes traffic efficiently, keeping Workers warm and minimizing latency. Enhanced for larger applications, this approach ensures optimal performance while accommodating user demands for richer functionalities.
-
Agoda Leverages ChatGPT in the CI/CD Process for SQL Stored Procedure Optimization
Agoda started utilizing ChatGPT to optimize SQL stored procedures (SP) as part of their CI/CD process. After introducing the automated LLM-assisted step, the company observed shortened stored procedure optimization times, which lightened the load on DB developers. Agora works on making ChatGPT more accessible for SP optimization outside of the CI/CD pipeline.