InfoQ Homepage Scalability Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

ProxySQL Introduces Multi-Tier Release Strategy with Stable, Innovative, and AI Tracks

ProxySQL 3.0.6 was recently released, along with a new multi-tier release strategy. The Stable Tier focuses on reliability and production use, the Innovative Tier introduces newer features earlier, and the AI/MCP Tier explores future capabilities, including AI integrations.

Renato Losio
on Mar 29, 2026
DevOps

Netflix Uncovers Kernel-Level Bottlenecks While Scaling Containers on Modern CPUs

Engineers at Netflix have uncovered deep performance bottlenecks in container scaling that trace not to Kubernetes or containerd alone, but into the CPU architecture and Linux kernel itself.

Craig Risi
on Mar 13, 2026
Architecture & Design

From Minutes to Seconds: Uber Boosts MySQL Cluster Uptime with Consensus Architecture

Uber redesigned its MySQL fleet using a consensus-driven architecture based on MySQL Group Replication, reducing cluster failover time from minutes to seconds. By moving leader election and failure detection into the database layer, Uber improved availability, simplified external orchestration, and strengthened consistency across thousands of production clusters.

Leela Kumili
on Mar 11, 2026
Architecture & Design

Uforwarder: Uber’s Scalable Kafka Consumer Proxy for Efficient Event-Driven Microservices

Uber has open-sourced uForwarder, a push-based Kafka consumer proxy built to handle trillions of messages and multiple petabytes of data daily. The system introduces context-aware routing, head-of-line blocking mitigation, adaptive auto-rebalancing, and partition-level delay processing to improve scalability, workload isolation, and hardware efficiency in large-scale event-driven microservices.

Leela Kumili
on Feb 23, 2026
Cloud

Firestore Adds Pipeline Operations with over 100 New Query Features

Google has overhauled Firestore’s query engine, introducing "Pipeline operations" that enable complex server-side aggregations and array unnesting. The update shifts Firestore Enterprise toward an optional indexing model, allowing architects to prioritize write speed and lower costs. While it brings parity with MongoDB-style aggregations, the preview currently lacks real-time and emulator support.

Steef-Jan Wiggers
on Feb 14, 2026
Architecture & Design

Airbnb Expands Global Checkout with “Pay as a Local,” Scaling to 220 Markets in 14 Months

Airbnb expands its global checkout with the “Pay as a Local” initiative, supporting over 20 locally preferred payment methods across 220 markets. The company replatformed its payments system with domain-oriented services, reusable flow archetypes, and a centralized configuration, enhancing integration speed, reliability, testing, and observability for diverse payment methods worldwide.

Leela Kumili
on Feb 02, 2026
Architecture & Design

Meta Applies Mutation Testing with LLM to Improve Compliance Coverage

Meta applies large language models to mutation testing through its Automated Compliance Hardening system, generating targeted mutants and tests to improve compliance coverage, reduce overhead, and detect privacy and safety risks. The approach supports scalable, LLM-driven test generation and continuous compliance across Meta’s platforms.

Leela Kumili
on Jan 06, 2026
Architecture & Design

AWS Expands Well‑Architected Guidance with Data Residency and Hybrid Cloud Lens

Earlier this year, AWS launched the Well-Architected Data Residency with Hybrid Cloud Services Lens, providing guidance for hybrid cloud workloads. The lens covers data classification, operational practices, automation, and compliance, helping organizations manage data location while optimizing security, cost, and resilience.

Leela Kumili
on Dec 29, 2025
DevOps

Yelp Publishes Blueprint for Managing S3 Server-Access Logs at Massive Scale

In a detailed engineering post, Yelp shared how it built a scalable and cost-efficient pipeline for processing Amazon S3 server-access logs (SAL) across its infrastructure, overcoming traditional limitations of raw log storage and querying at high volume.

Craig Risi
on Dec 13, 2025
DevOps

Enhancing Reliability Using Service-Level Prioritized Load Shedding: Netflix at QCon SF 2025

At QCon San Francisco, Netflix engineers unveiled their advanced Service-Level-Prioritized Load-Shedding strategy, enhancing reliability during traffic spikes. By prioritizing high-value requests and automating management across microservices, they safeguard user experience and system stability. Key insights stress prioritization, automation, and structured load shedding for optimal resilience.

Steef-Jan Wiggers
on Nov 20, 2025
AI, ML & Data Engineering

Inside the Architectures Powering Modern AI Systems: QCon San Francisco 2025

Senior engineers face fast-moving AI adoption without clear patterns. QCon SF 2025 brings real-world lessons from teams at Netflix, Meta, Intuit, Anthropic & more, showing how to build reliable AI systems at scale. Early bird ends Nov 11.

Artenisa Chatziou
on Oct 30, 2025
Architecture & Design

Pinterest Unifies Engineering Tools with New Pinconsole Platform

Pinterest has introduced PinConsole, a unified internal developer platform (IDP) that centralizes engineering workflows. Built to address fragmented tools for deployment, monitoring, and service management, PinConsole provides a consistent layer that lets engineers focus on business logic instead of infrastructure complexity.

Leela Kumili
on Sep 15, 2025
Architecture & Design

Uber Eats Scales Catalog Management from Restaurants to Retail with INCA Framework

Uber Eats introduced INCA (Inventory and Catalog), a scalable system to handle vast product catalogs from supermarkets, pharmacies, and retail partners. Unlike the earlier restaurant-focused setup built for low SKUs and simple pass-through data, INCA supports large-scale inventories, rich metadata, and compliance needs essential for retail operations.

Leela Kumili
on Aug 29, 2025
Architecture & Design

Grab Switches from SQS and Redis to Temporal for Its Subscription Platform

Grab based the new architecture for GrabUnlimited on Temporal. The company enhanced user experience and reduced production incidents by 80% for its subscription platform, which serves millions of users. The new architecture significantly improved robustness and scalability, addressing a range of issues with the previous solution.

Rafal Gancarz
on Jul 21, 2025
Cloud

Figma's $300,000 Daily AWS Bill Highlights Cloud Dependency Risks

Figma's IPO filing reveals a staggering $300,000 daily spend on AWS, totaling $100 million annually, or 12% of its $821 million revenue. The company's deep reliance on AWS exposes it to significant risks, including potential outages and policy changes. This highlights the critical dilemma for tech firms: balancing the benefits of cloud agility with rising costs and vendor lock-in challenges.

Steef-Jan Wiggers
on Jul 09, 2025

Newer News

Older News

InfoQ Software Architects' Newsletter

News