InfoQ Homepage Scalability Content on InfoQ
-
Enhancing Reliability Using Service-Level Prioritized Load Shedding: Netflix at QCon SF 2025
At QCon San Francisco, Netflix engineers unveiled their advanced Service-Level-Prioritized Load-Shedding strategy, enhancing reliability during traffic spikes. By prioritizing high-value requests and automating management across microservices, they safeguard user experience and system stability. Key insights stress prioritization, automation, and structured load shedding for optimal resilience.
-
Inside the Architectures Powering Modern AI Systems: QCon San Francisco 2025
Senior engineers face fast-moving AI adoption without clear patterns. QCon SF 2025 brings real-world lessons from teams at Netflix, Meta, Intuit, Anthropic & more, showing how to build reliable AI systems at scale. Early bird ends Nov 11.
-
Pinterest Unifies Engineering Tools with New Pinconsole Platform
Pinterest has introduced PinConsole, a unified internal developer platform (IDP) that centralizes engineering workflows. Built to address fragmented tools for deployment, monitoring, and service management, PinConsole provides a consistent layer that lets engineers focus on business logic instead of infrastructure complexity.
-
Uber Eats Scales Catalog Management from Restaurants to Retail with INCA Framework
Uber Eats introduced INCA (Inventory and Catalog), a scalable system to handle vast product catalogs from supermarkets, pharmacies, and retail partners. Unlike the earlier restaurant-focused setup built for low SKUs and simple pass-through data, INCA supports large-scale inventories, rich metadata, and compliance needs essential for retail operations.
-
Grab Switches from SQS and Redis to Temporal for Its Subscription Platform
Grab based the new architecture for GrabUnlimited on Temporal. The company enhanced user experience and reduced production incidents by 80% for its subscription platform, which serves millions of users. The new architecture significantly improved robustness and scalability, addressing a range of issues with the previous solution.
-
Figma's $300,000 Daily AWS Bill Highlights Cloud Dependency Risks
Figma's IPO filing reveals a staggering $300,000 daily spend on AWS, totaling $100 million annually, or 12% of its $821 million revenue. The company's deep reliance on AWS exposes it to significant risks, including potential outages and policy changes. This highlights the critical dilemma for tech firms: balancing the benefits of cloud agility with rising costs and vendor lock-in challenges.
-
InfoQ Dev Summit Boston 2025: AI, Platforms, and Developer Experience
Software development is shifting fast. Senior engineers need real-world insights on AI, platforms, and developer autonomy. InfoQ Dev Summit Boston (June 9-10) offers 2 days with over 27 sessions of curated, technical talks delivered by engineers actively working at scale. We are focused on helping teams navigate the software evolution, with the clarity and context needed to make better decisions.
-
Stripe Rearchitects Its Observability Platform with Managed Prometheus and Grafana on AWS
Stripe replaced its observability platform, which used a third-party vendor solution, with a new architecture utilizing managed services on AWS. The company made the move due to scalability limits, reliability issues, and increasing costs while transitioning to microservices. The migration involved dual-writing metrics, translating assets, validation, and user training.
-
Netflix’s Pushy: Evolution of Scalable WebSocket Platform That Handles 100Ms Concurrent Connections
Netflix shared details on the evolution of Pushy, a WebSocket messaging platform that supports push notifications and inter-device communication across many different devices for the company’s products. Netflix’s engineers implemented many improvements across the Pushy ecosystem to ensure the platform's scalability and reliability and support new capabilities.
-
How Amazon Aurora Serverless Manages Resources and Scaling for Fleets of 10K+ Instances
AWS engineers published a paper describing the evolution and latest design of resource management and scaling for the Amazon Aurora Serverless platform. Aurora Serverless uses a combination of components at different levels to create a holistic approach for dynamically scaling and adjusting resources to satisfy the needs of customer workloads.
-
Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day
Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.
-
Microsoft Introduces the Public Preview of Flex Consumption Plan for Azure Functions at Build
At the annual Build conference, Microsoft announced the flex consumption plan for Azure Functions, which brings users fast and large elastic scale, instance size selection, private networking, availability zones, and higher concurrency control.
-
QCon London: Scaling Microservices Architecture and Technology Organization at Trainline
During the recent QCon London conference, Trainline’s CTO spoke about the evolution of the company’s system architecture and organizational structure over the last five years. The company had to adapt to market changes and growing customer expectations by improving the performance and reliability of its technology platform.
-
QCon London: How Duolingo Sent 4 Million Push Notifications in 6 Seconds During the Super Bowl Break
As part of the Super Bowl marketing campaign, Duolingo sent out 4 million mobile push notifications when the company’s five-second ad aired during the commercial break. At QCon London, Doulingo’s engineers presented the asynchronous AWS architecture responsible for broadcasting messages to millions of users across seven US cities.
-
Hashnode Creates Scalable Feed Architecture on AWS with Step Functions, EventBridge and Redis
Hashnode created a scalable event-driven architecture (EDA) for composing feed data for thousands of users. The company used serverless services on AWS, including Lambda, Step Functions, EventBridge, and Redis Cache. The solution leverages Step Functions' distributed maps feature that enables high-concurrency processing.