InfoQ Homepage Scaling Content on InfoQ
-
Enhancing Reliability Using Service-Level Prioritized Load Shedding: Netflix at QCon SF 2025
At QCon San Francisco, Netflix engineers unveiled their advanced Service-Level-Prioritized Load-Shedding strategy, enhancing reliability during traffic spikes. By prioritizing high-value requests and automating management across microservices, they safeguard user experience and system stability. Key insights stress prioritization, automation, and structured load shedding for optimal resilience.
-
Advanced Autoscaling Helps Companies Reduce AWS Costs by 70%
The next generation of Kubernetes autoscaling techniques and tools is enabling organisations to make substantial cost savings in their cloud infrastructure. Svetlana Burninova recently used Karpenter to build a multi-architecture EKS cluster and managed a 70% reduction in cost whilst also improving performance.
-
Amazon DocumentDB Serverless: Auto-Scaling Database Solution for Variable Workloads
AWS has launched Amazon DocumentDB Serverless, an auto-scaling database solution compatible with MongoDB, tailored for variable workloads. While marketed as "serverless," it functions more like auto-scaling, charging from $30/month. Ideal for enterprises and SaaS vendors, it adeptly handles spikes in demand, particularly for AI-driven applications.
-
Inflection Points in Engineering Productivity for Improving Productivity and Operational Excellence
As companies grow, investing in custom developer tools may become necessary. Initially, standard tools suffice, but as companies scale in engineers, maturity, and complexity, industry tools may no longer meet needs. Inflection points, such as a crisis, hyper-growth, or reaching a new market, often trigger investments, providing opportunities for improving productivity and operational excellence.
-
Lessons Learned from Growing an Engineering Organization
As their organization grew, Thiago Ghisi's work as director of engineering shifted from being hands-on in emergencies to designing frameworks and delegating decisions. He suggested treating changes as experiments, documenting reorganizations, and using a wave-based communication approach to gather feedback, ensuring people feel heard and invested.
-
Optimizing Amazon ECS with Predictive Scaling
Amazon Web Services (AWS) recently released Predictive Scaling for Amazon ECS, an advanced scaling policy that employs machine learning (ML) algorithms to anticipate demand surges, ensuring applications remain highly available and responsive while minimizing resource overprovisioning.
-
Staying Innovative on a Journey from Start-Up to Scale-Up
As ClearBank grew, it faced the challenge of maintaining its innovative culture while integrating more structured processes to manage its expanding operations and ensure regulatory compliance. Within boundaries of accountability and responsibility, teams were given space to evolve their own areas, innovate a little, experiment, and continuously improve, to remain innovative.
-
Deezer Optimizes Kubernetes Autoscaling with Custom Metrics
Popular music streaming service Deezer has written about using custom metrics to enable auto-scaling in its Kubernetes infrastructure. Server utilisation and performance issues made scaling applications to an appropriate size and number of replicas challenging, and Kuberenetes' HPA scaling alone didn't solve these issues. So Deezer turned to custom metrics.
-
Kubernetes Autoscaler Karpenter Reaches 1.0 Milestone
Amazon Web Services (AWS) has released version 1.0 of Karpenter, an open-source Kubernetes cluster auto-scaling tool. This release marks Karpenter's graduation from beta status and introduces stable APIs and several new features. Karpenter, initially launched in November 2021, has evolved into a comprehensive Kubernetes-native node lifecycle manager.
-
How Tech-Enabled Networks of Software Teams Work
To maintain agility at scale, software teams can use technological and organizational solutions to reduce dependencies and work autonomously. According to Fabrice Bernhard, collaboration technology can be leveraged to create a distributed network of teams. To empower their teams, leaders can support them with a systematic problem-solving culture aimed at delivering good products to customers.
-
How to Build Large Scale Cyber-Physical Systems
To build large-scale safety-critical systems, we need to decompose the system into smaller solvable problems, resolve what is known, and resolve unknowns through experiments, Robin Yeman argued. She suggested investing in test environments for both software and hardware early to enable being test-driven early to increase the safety, security, reliability, and availability of the systems.
-
Expedia Open-Sources Container-Startup-Autoscaler (CSA) for Scaling Kubernetes Workloads
Expedia's Performance and Reliability team has recently open-sourced its container-startup-autoscaler (CSA). It is a Kubernetes controller leveraging the In-Place Update of Pod Resources feature to dynamically adjust CPU and/or memory resources of containers during startup based on user-defined startup/post-startup configurations.
-
DigitalOcean Introduces CPU-Based Autoscaling for its App Plaform
DigitalOcean has launched automatic horizontal scaling for its App Platform PaaS, aiming to free developers from the burden of scaling services up or down based on CPU load all by themselves.
-
How to Create a UI That's Both Robust and User Friendly
The key challenge in building UIs is balancing ease of use and maintainability, with scale and complexity. It requires thoughtful component design and an understanding of common usage paths to create a UI that's both robust and user-friendly. Automation can be a game-changer when it comes to improving efficiency and consistency in your codebase.
-
Developing Software to Manage Distributed Energy Systems at Scale
Functional programming techniques can make software more composable, reliable, and testable. For systems at scale, trade-offs in edge vs. cloud computing can impact speed and security.