InfoQ Homepage Low Latency Content on InfoQ
-
Agoda Builds Multimodal Content System to Bridge Images and Reviews in Travel Discovery
Agoda Multimodal Content System</title><link>https://example.com/agoda-multimodal-content-system</link><description>Agoda unifies hotel images and guest reviews using a shared topic taxonomy, enabling multimodal retrieval across 700M+ images and multilingual reviews with offline enrichment and low-latency serving.
-
Swiggy Improves Search Autocomplete Using Real Time Machine Learning Ranking
Swiggy detailed real-time machine-learning ranking system for autocomplete built on OpenSearch. The architecture separates candidate generation and ranking, uses feature stores for real time signals, and applies learning to rank models for improved relevance. It replaces heuristic ranking while maintaining strict latency constraints and enabling continuous model updates from user behavior signals.
-
OpenAI Introduces Websocket-Based Execution Mode to Reduce Latency in Agentic Workflows
OpenAI introduces a WebSocket-based execution mode for its Responses API to improve agentic workflow performance in coding agents and real-time AI systems. The update reduces latency by up to 40 percent by replacing HTTP request-response cycles with persistent connections, improving streaming, tool execution, and multi-step orchestration in production-scale AI systems.
-
Cloudflare Introduces Flagship: an Edge-Native Feature Flag Service Built on OpenFeature
Cloudflare recently announced the closed beta of Flagship, a new feature flag service built directly into its global edge platform. The service lets teams control feature rollouts and experiment with changes without redeploying code, while evaluating flags locally in Cloudflare Workers rather than calling external flag services.
-
Inside Agoda’s Storefront: a Latency-Aware Reverse Proxy for Improving DNS Based Load Distribution
Agoda engineers developed Storefront, a Rust-based S3-compatible reverse proxy that improves load balancing, request routing, and observability across large-scale object storage systems. The proxy addresses DNS-based distribution limitations, implements latency-aware routing, cross-data-center optimizations, IO safeguards, credential-less authentication, and exposes telemetry via OpenTelemetry.
-
Inside Netflix’s Graph Abstraction: Handling 650TB of Graph Data in Milliseconds Globally
Netflix engineers built Graph Abstraction, a high-throughput platform managing 650 TB of graph data with millisecond latency. Supporting services from Netflix Gaming’s social graphs to operational topology graphs, it maintains global availability via asynchronous replication. This article covers its architecture, caching, and traversal design for high-scale performance.
-
Cloudflare Introduces Local Uploads for R2 to Cut Cross-Region Write Latency by 75%
Cloudflare has recently introduced Local Uploads for R2 in open beta. The new feature optimizes write performance for globally distributed users without changing bucket location, reducing cross-region write latency.
-
Reducing Onboarding from 48 Hours to 4: inside Amazon Key’s Event-Driven Platform
Amazon Key modernized its event platform by adopting a centralized, event-driven architecture built on Amazon EventBridge. The redesign processes millions of daily events with millisecond latency, improves schema governance, automates cross-account routing, and reduces service onboarding time from 48 hours to four, while maintaining 99.99 percent reliability.
-
Uber Moves from Static Limits to Priority-Aware Load Control for Distributed Storage
Uber engineers detailed how they evolved their storage platform from static rate limiting to a priority-aware load management system. The approach protects Docstore and Schemaless, Uber’s MySQL-based distributed databases, by colocating control with storage, prioritizing critical traffic, and dynamically shedding load under overload conditions.
-
From On-Demand to Live : Netflix Streaming to 100 Million Devices in under 1 Minute
Netflix’s global live streaming platform powers millions of viewers with cloud-based ingest, custom live origin, Open Connect delivery, and real-time recommendations. This article explores the architecture, low-latency pipelines, adaptive bitrate streaming, and operational monitoring that ensure reliable, scalable, and synchronized live event experiences worldwide.
-
Reddit Migrates Comment Backend from Python to Go Microservice to Halve Latency
Reddit has rebuilt its core backend, migrating Comments, Accounts, Posts, and Subreddits from a legacy Python monolith to Go microservices. The migration improves performance, halves critical write latency, and modernizes the platform for future scalability while preserving correctness across multiple datastores.
-
Airbnb Adds Adaptive Traffic Control to Manage Key Value Store Spikes
Airbnb upgraded Mussel, its multi-tenant key-value store, replacing static per-client rate limits with an adaptive, resource-aware traffic control system. The redesign ensures resilience during traffic spikes, protects critical workflows, and maintains fair usage across thousands of tenants while scaling efficiently.
-
Inside Uber’s Query Architecture: Simplifying Layers and Improving Observability
Uber rebuilt its Apache Pinot query architecture, replacing the Presto-based Neutrino system with a lightweight proxy called Cellar and Pinot’s Multi-Stage Engine Lite Mode. The redesign simplifies SQL execution, improves resource management, and ensures predictable performance for large-scale analytics workloads.
-
Uber Achieves 150M Reads per Second with CacheFront Improvements
Uber has updated its CacheFront architecture to handle over 150 million reads per second. The new design improves consistency and reduces stale reads by integrating Flux for MySQL binlog tailing, enhancing the storage engine, and introducing Cache Inspector for monitoring and optimization.
-
Allegro Reduces Kafka Producer Latency Outliers by 82% after Switching to XFS
Allegro experimented with different performance optimization options to improve Apache Kafka producer tail latency and eventually switched all its clusters to the XFS filesystem. The company used Kafka protocol sniffing, JVM profiling, and eBPF, which proved instrumental in identifying and eliminating performance bottlenecks.