BT

Your opinion matters! Please fill in the InfoQ Survey!

Google Maglev: A Load Balancer on Commodity Servers

| by Abel Avram Follow 4 Followers on Mar 17, 2016. Estimated reading time: 1 minute |

A note to our readers: As per your request we have developed a set of features that allow you to reduce the noise, while not losing sight of anything that is important. Get email and web notifications by choosing the topics you are interested in.

A group of engineers from Google, UCLA, SpaceX are presenting the paper Maglev: A Fast and Reliable Software Network Load Balancer (PDF) at the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI '16) taking place this week. Maglev is Google’s network load balancer.

Unlike a dedicated load balancer hardware, Maglev is a software solution running on commodity servers. Instead of acquiring specialized hardware ahead of time to provide enough capacity for traffic peaks, Google runs Maglev on regular servers, adding more of them to the pool as demand grows. Maglev was developed in-house by Google for their own data centers and has been used in production since 2008.

Google services run in clusters in multiple data centers spread around the world. Each such cluster has a load balancer which consists of multiple devices placed between routers and the servers providing services. Dedicated load balancers are usually deployed in active-passive pairs to provide 1+1 redundancy, which makes one of them idle, resulting in unused capacity. Also, they are limited by their capacity and hard or impossible to reprogram. Google has decided to use a configuration providing N+1 redundancy with their own software and commodity servers, for better scalability and flexibility, as shown in the following graphic.

maglev

Regarding performance, a single Maglev server can “saturate a 10Gbps link with small packets. Maglev is also equipped with consistent hashing and connection tracking features, to minimize the negative impact of unexpected faults and failures on connection-oriented protocols.” Maglev is being used for Google Cloud to serve 1M requests/sec within 5 seconds after setup and without pre-warming. During a performance benchmark conducted by Google, a Maglev instance running on one 8-core CPU capped at 12M pps (packets per second). Maglev is not using the Linux kernel network stack which would slow it down to less than 4M pps.

The paper presents in detail how a request is processed by Maglev, how virtual IP addresses are handled, how the request is directed to a service end-point based on an Equal Cost Multipath (ECMP) algorithm, hashing and others.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT