Twitter recently shared the details of why their RPC framework Finagle implements client-side load balancing using a deterministic aperture algorithm for their microservices architecture. Twitter ran different experiments but confirmed that with a deterministic approach, requests are better distributed, connections count reduces drastically, and they also need less infrastructure.
Twitter has been using a client-side load balancing technique for several years with its microservices architecture. They call this technique a "deterministic aperture," and it's part of Finagle's RPC framework, an open-source project for the JVM. Finagle embeds a client-side load balancer in every client. Instead of making calls to a central server-side load balancer, all requests go straight to a destination server, without an intermediary. This reduces the need for an extra infrastructure layer, and also reduces network hops, bandwidth, and points of failure in the system. Client-side load balancing is an approach that other projects like Baker Street and Netflix Ribbon use. And also companies like Yelp, Airbnb, or Stripe use it to run microservices systems.
Using client-side load balancers means that now there can be multiple load balancers distributed within clients throughout the system—at least one per client. Therefore, it gets complicated when trying to distribute traffic load to servers in an even manner, especially when there are thousands of servers. For this reason, Finagle's deterministic aperture algorithm combines the P2C (power of two random choices) approach for distributing traffic load with the combination of a deterministic approach when picking which servers to connect. P2C is a strategy based on Michael Mitzenmacher's paper "The power of two choices in randomized load balancing" where a load balancer picks a fraction of severs from the cluster randomly, then route requests to the least loaded server.
To explain how the determinist aperture algorithm works, Twitter represents the client and backend servers as rings in equidistant intervals.
Source: SREcon19 Americas - Aperture: A Non-Cooperative, Client-Side Load Balancing Algorithm
As a first step, the load balancer must pick a subset of servers to connect, which is the aperture size. This subset only represents to which servers the load balancer will connect to then route requests. At a minimum, the aperture size will be 1/N, where N is the number of servers—as seen in the previous diagram. But this can be configured differently in Finagle by using a feedback controller to get a better aperture size.
Once the aperture size is settled, all requests are distributed evenly within the servers, but this distribution doesn't necessarily mean that it routes requests equally—it might not be fair for the whole system. That is, for 100% of requests within three servers, one server could receive 10%, another 60%, and the other 30%. The aperture size could change when the cluster needs to add or remove servers. This algorithm implements light coordination between clients and minimal disruption in the whole system.
Source: SREcon19 Americas - Aperture: A Non-Cooperative, Client-Side Load Balancing Algorithm
Before coming to the solution of a deterministic aperture algorithm, Twitter was picking the subset of servers randomly, allowing them to cut down the number of connections. But, they noticed that the request load wasn't being distributed fairly. For instance, there were more occupied servers receiving 400 requests per second, whereas others were idle with only a small fraction of requests as seen in the below diagram:
Results after running the random aperture algorithm in production
At SRECon. Ruben Oanta from Twitter shared the results from running Finagle's deterministic aperture algorithm in production. First, requests were more evenly distributed across servers, which helped them to make efficient use of resources. Then, Twitter reduced by 91% the number of connections to the destination servers. Finally, they had fewer request failures, latency was reduced at the 99.9 percentile, there was a 25% reduction in CPU usage, and the total amount of Java garbage collection (GC) was cut in half.
However, the deterministic aperture has a few limitations. For instance, for small clusters, the number of connections is often higher than with a large cluster, due to the minimum aperture size. Also, if traffic is "bursty", as often occurs after a cache flush, some server apertures end up receiving more traffic than others.
Users can learn more about how to get started with Finagle and its aperture load balancer in the project site or the GitHub repository.