InfoQ Homepage Performance & Scalability Content on InfoQ
-
Read-Copy-Update (RCU): the Secret to Lock-Free Performance
Innovative software engineer with expertise in optimizing concurrency through advanced techniques like Read-Copy-Update (RCU). Proven track record of boosting read performance by over 110% in read-heavy workloads. Skilled in leveraging RCU principles across production systems, enhancing architecture efficiency, and streamlining data handling to maximize scalability and minimize overhead.
-
Proactive Autoscaling for Edge Applications in Kubernetes
Kubernetes often reacts too late when traffic suddenly increases at the edge. A proactive scaling approach that considers response time, spare CPU capacity, and container startup delays can add or remove instances more smoothly, prevent sudden spikes, and keep performance stable on systems with limited resources.
-
When Reverse Proxies Surprise You: Hard Lessons from Operating at Scale
Operating massive reverse proxy fleets reveals hard lessons: optimizations that work on smaller systems fail at scale; mundane oversights like missing commas cause major outages; and abstractions meant to simplify become hidden fragility points. Success requires profiling on target hardware, relentlessly monitoring boring details, keeping hot paths lean, and trusting instrumentation over theory.
-
Analyzing Apache Kafka Stretch Clusters: WAN Disruptions, Failure Scenarios, and DR Strategies
Proficient in analyzing the dynamics of Apache Kafka Stretch Clusters, I assess WAN disruptions and devise effective Disaster Recovery (DR) strategies. With deep expertise, I ensure high availability and data integrity across multi-region deployments. My insights optimize operational resilience, safeguarding vital services against service level agreement violations.
-
Designing Resilient Event-Driven Systems at Scale
Learn how to design resilient event-driven systems that scale. Explore key patterns like shuffle sharding and decoupling queues to handle load spikes and failures. Understand common pitfalls like over-relying on retries and neglecting observability for robust, scalable architectures.
-
Transforming Legacy Healthcare Systems: a Journey to Cloud-Native Architecture
Discover how Livi navigated the complexities of transitioning MJog, a legacy healthcare system, to a cloud-native architecture, sharing valuable insights for successful tech modernization. Our experience illustrates that transitioning from legacy systems to cloud-based microservices is not a one-time project, but an ongoing journey.
-
How Netflix Ensures Highly-Reliable Online Stateful Systems
Building reliable stateful services at scale isn’t a matter of building reliability into the servers, the clients, or the APIs in isolation. By combining smart and meaningful choices for each of these three components, we can build massively scalable, SLO-compliant stateful services at Netflix.
-
Magic Pocket: Dropbox’s Exabyte-Scale Blob Storage System
A horizontally scalable exabyte-scale blob storage system which operates out of multiple regions, Magic Pocket is used to store all of Dropbox’s data. Adopting SMR technology and erasure codes, the system has extremely high durability guarantees but is cheaper than operating in the cloud.
-
Design Pattern Proposal for Autoscaling Stateful Systems
In this article, Rogerio Robetti discusses the challenges in auto-scaling stateful storage systems and proposes an opinionated design solution to automatically scale up (vertical) and scale out (horizontal) from a single node up to several nodes in a cluster with minimum configuration and interference of the operator.
-
A Recipe to Migrate and Scale Monoliths in the Cloud
In this article, I want to present a simple cloud architecture that can allow an organization to take monolithic applications to the cloud incrementally without a dramatic change in the architecture. We will discuss the minimal requirements and basic components to take advantage of the scalability of the cloud.
-
Using the Plan-Do-Check-Act Framework to Produce Performant and Highly Available Systems
The PDCA (plan-do-check-act) framework can be used to outline the performance, availability, and monitoring to enable teams to ensure performant and highly available applications. These include infrastructure design and setup, application architecture and design, coding, performance testing, and application monitoring.
-
Donkey: a Highly-Performant HTTP Stack for Clojure
Donkey is the product of the quest for a highly performant Clojure HTTP stack aimed to scale at the rapid pace of growth we have been experiencing at AppsFlyer, and save us computing costs. In this article, we’ll briefly outline the use-case for a library like Donkey and present our benchmarks. Finally, we will discuss Clojure and immutability, and some of our design decisions.