InfoQ Homepage Scalability Content on InfoQ
-
ProxySQL Introduces Multi-Tier Release Strategy with Stable, Innovative, and AI Tracks
ProxySQL 3.0.6 was recently released, along with a new multi-tier release strategy. The Stable Tier focuses on reliability and production use, the Innovative Tier introduces newer features earlier, and the AI/MCP Tier explores future capabilities, including AI integrations.
-
Netflix Uncovers Kernel-Level Bottlenecks While Scaling Containers on Modern CPUs
Engineers at Netflix have uncovered deep performance bottlenecks in container scaling that trace not to Kubernetes or containerd alone, but into the CPU architecture and Linux kernel itself.
-
From Minutes to Seconds: Uber Boosts MySQL Cluster Uptime with Consensus Architecture
Uber redesigned its MySQL fleet using a consensus-driven architecture based on MySQL Group Replication, reducing cluster failover time from minutes to seconds. By moving leader election and failure detection into the database layer, Uber improved availability, simplified external orchestration, and strengthened consistency across thousands of production clusters.
-
Uforwarder: Uber’s Scalable Kafka Consumer Proxy for Efficient Event-Driven Microservices
Uber has open-sourced uForwarder, a push-based Kafka consumer proxy built to handle trillions of messages and multiple petabytes of data daily. The system introduces context-aware routing, head-of-line blocking mitigation, adaptive auto-rebalancing, and partition-level delay processing to improve scalability, workload isolation, and hardware efficiency in large-scale event-driven microservices.
-
Firestore Adds Pipeline Operations with over 100 New Query Features
Google has overhauled Firestore’s query engine, introducing "Pipeline operations" that enable complex server-side aggregations and array unnesting. The update shifts Firestore Enterprise toward an optional indexing model, allowing architects to prioritize write speed and lower costs. While it brings parity with MongoDB-style aggregations, the preview currently lacks real-time and emulator support.
-
Airbnb Expands Global Checkout with “Pay as a Local,” Scaling to 220 Markets in 14 Months
Airbnb expands its global checkout with the “Pay as a Local” initiative, supporting over 20 locally preferred payment methods across 220 markets. The company replatformed its payments system with domain-oriented services, reusable flow archetypes, and a centralized configuration, enhancing integration speed, reliability, testing, and observability for diverse payment methods worldwide.
-
Meta Applies Mutation Testing with LLM to Improve Compliance Coverage
Meta applies large language models to mutation testing through its Automated Compliance Hardening system, generating targeted mutants and tests to improve compliance coverage, reduce overhead, and detect privacy and safety risks. The approach supports scalable, LLM-driven test generation and continuous compliance across Meta’s platforms.
-
AWS Expands Well‑Architected Guidance with Data Residency and Hybrid Cloud Lens
Earlier this year, AWS launched the Well-Architected Data Residency with Hybrid Cloud Services Lens, providing guidance for hybrid cloud workloads. The lens covers data classification, operational practices, automation, and compliance, helping organizations manage data location while optimizing security, cost, and resilience.
-
Yelp Publishes Blueprint for Managing S3 Server-Access Logs at Massive Scale
In a detailed engineering post, Yelp shared how it built a scalable and cost-efficient pipeline for processing Amazon S3 server-access logs (SAL) across its infrastructure, overcoming traditional limitations of raw log storage and querying at high volume.
-
Enhancing Reliability Using Service-Level Prioritized Load Shedding: Netflix at QCon SF 2025
At QCon San Francisco, Netflix engineers unveiled their advanced Service-Level-Prioritized Load-Shedding strategy, enhancing reliability during traffic spikes. By prioritizing high-value requests and automating management across microservices, they safeguard user experience and system stability. Key insights stress prioritization, automation, and structured load shedding for optimal resilience.
-
Inside the Architectures Powering Modern AI Systems: QCon San Francisco 2025
Senior engineers face fast-moving AI adoption without clear patterns. QCon SF 2025 brings real-world lessons from teams at Netflix, Meta, Intuit, Anthropic & more, showing how to build reliable AI systems at scale. Early bird ends Nov 11.
-
Pinterest Unifies Engineering Tools with New Pinconsole Platform
Pinterest has introduced PinConsole, a unified internal developer platform (IDP) that centralizes engineering workflows. Built to address fragmented tools for deployment, monitoring, and service management, PinConsole provides a consistent layer that lets engineers focus on business logic instead of infrastructure complexity.
-
Uber Eats Scales Catalog Management from Restaurants to Retail with INCA Framework
Uber Eats introduced INCA (Inventory and Catalog), a scalable system to handle vast product catalogs from supermarkets, pharmacies, and retail partners. Unlike the earlier restaurant-focused setup built for low SKUs and simple pass-through data, INCA supports large-scale inventories, rich metadata, and compliance needs essential for retail operations.
-
Grab Switches from SQS and Redis to Temporal for Its Subscription Platform
Grab based the new architecture for GrabUnlimited on Temporal. The company enhanced user experience and reduced production incidents by 80% for its subscription platform, which serves millions of users. The new architecture significantly improved robustness and scalability, addressing a range of issues with the previous solution.
-
Figma's $300,000 Daily AWS Bill Highlights Cloud Dependency Risks
Figma's IPO filing reveals a staggering $300,000 daily spend on AWS, totaling $100 million annually, or 12% of its $821 million revenue. The company's deep reliance on AWS exposes it to significant risks, including potential outages and policy changes. This highlights the critical dilemma for tech firms: balancing the benefits of cloud agility with rising costs and vendor lock-in challenges.