InfoQ Homepage Performance & Scalability Content on InfoQ
-
Discord Rebuilds Database Operations Around Automation to Manage ScyllaDB at Massive Scale
Discord has detailed how it rebuilt its database operations around a new internal orchestration framework called the Scylla Control Plane (SCP), enabling its small infrastructure team to automate large-scale ScyllaDB cluster management tasks that previously took days of manual work.
-
Meta Deploys Unified AI Agents to Automate Performance Optimization at Hyperscale
Meta has unveiled a new AI-driven capacity efficiency platform that uses unified AI agents to automatically detect and resolve performance issues across its global infrastructure, marking a significant step toward self-optimizing systems at hyperscale.
-
Dropbox Redesigns Compaction to Reclaim Space from Underfilled Storage Volumes
Dropbox recently explained how it improved storage efficiency in Magic Pocket, the company's internal immutable blob store for storing user files at scale, by redesigning compaction strategies to reclaim space from severely underfilled storage volumes. The system now periodically reorganizes valid data into new volumes, allowing old, partially used ones to be cleared and reused.
-
ProxySQL Introduces Multi-Tier Release Strategy with Stable, Innovative, and AI Tracks
ProxySQL 3.0.6 was recently released, along with a new multi-tier release strategy. The Stable Tier focuses on reliability and production use, the Innovative Tier introduces newer features earlier, and the AI/MCP Tier explores future capabilities, including AI integrations.
-
Enhancing Reliability Using Service-Level Prioritized Load Shedding: Netflix at QCon SF 2025
At QCon San Francisco, Netflix engineers unveiled their advanced Service-Level-Prioritized Load-Shedding strategy, enhancing reliability during traffic spikes. By prioritizing high-value requests and automating management across microservices, they safeguard user experience and system stability. Key insights stress prioritization, automation, and structured load shedding for optimal resilience.
-
Meta Open Sources OpenZL: a Universal Compression Framework for Structured Data
Meta’s OpenZL changes the way data is compressed by maximizing efficiency for structured datasets, outperforming traditional methods like Zstandard. With a universal decompressor and custom compression plans, it simplifies operational deployment while achieving superior compression ratios and speeds, making it an essential tool for modern data infrastructures.
-
PlanetScale Extends Database Platform to PostgreSQL
PlanetScale has announced the general availability of its managed sharded Postgres service, built for performance and reliability on AWS or Google Cloud. The launch extends PlanetScale's offerings to PostgreSQL users, adding to the company's existing popular MySQL-based platform built on top of Vitess.
-
Cloudflare Achieves 99.99% Warm Start Rate for Workers with 'Shard and Conquer' Consistent Hashing
Cloudflare's innovative "Shard and Conquer" technique revolutionizes its serverless platform by slashing cold start rates by 90%. Utilizing a consistent hash ring, it routes traffic efficiently, keeping Workers warm and minimizing latency. Enhanced for larger applications, this approach ensures optimal performance while accommodating user demands for richer functionalities.
-
Uber Achieves 150M Reads per Second with CacheFront Improvements
Uber has updated its CacheFront architecture to handle over 150 million reads per second. The new design improves consistency and reduces stale reads by integrating Flux for MySQL binlog tailing, enhancing the storage engine, and introducing Cache Inspector for monitoring and optimization.
-
Anthropic Reveals Three Infrastructure Bugs behind Claude Performance Issues
Anthropic recently published a postmortem revealing that three distinct infrastructure bugs intermittently degraded the output quality of its Claude models in recent weeks. While the company states it has now resolved those issues and is modifying its internal processes to prevent similar disruptions, the community highlights the challenges of running the service across three hardware platforms.
-
Datadog Launches Monocle, a Unified Rust-Powered Real-Time Metrics Engine
Datadog has launched Monocle, a new real-time time series storage engine written in Rust. The system unifies the company’s metrics storage infrastructure, delivering higher ingestion throughput and lower query latency while reducing operational complexity. Monocle replaces several generations of storage backends, addressing concurrency challenges and scaling limits that accumulated over time.
-
Pinterest Unifies Engineering Tools with New Pinconsole Platform
Pinterest has introduced PinConsole, a unified internal developer platform (IDP) that centralizes engineering workflows. Built to address fragmented tools for deployment, monitoring, and service management, PinConsole provides a consistent layer that lets engineers focus on business logic instead of infrastructure complexity.
-
Impulse, Airbnb’s New Framework for Context-Aware Load Testing
Airbnb has developed Impulse, an internal load testing framework to improve microservice reliability and performance. It enables distributed, large-scale testing and lets teams run self-service, context-aware load tests integrated with CI pipelines. By simulating production-like traffic, Impulse helps engineers identify bottlenecks and errors before changes reach production.
-
Uber Eats Scales Catalog Management from Restaurants to Retail with INCA Framework
Uber Eats introduced INCA (Inventory and Catalog), a scalable system to handle vast product catalogs from supermarkets, pharmacies, and retail partners. Unlike the earlier restaurant-focused setup built for low SKUs and simple pass-through data, INCA supports large-scale inventories, rich metadata, and compliance needs essential for retail operations.
-
AWS Lambda Response Streaming Increases Payload Limit to 200 MB
AWS has revolutionized Lambda with an increased response streaming payload limit from 20 MB to 200 MB. This enhancement allows developers to stream larger data sets, improving Time to First Byte performance. By simplifying response handling and eliminating complex workarounds, AWS empowers developers to deliver rich content seamlessly, transforming serverless applications.