How Shopify Implements Custom Autoscaling Rules in Kubernetes

Andy Kwiatkowski from Shopify talked at the Velocity conference in Berlin about why they had to create a custom autoscaler in Kubernetes. Existing solutions for autoscaling didn’t fulfill Shopify’s needs, mainly because of the large and sudden influx of traffic requests they receive. Also, they needed a cost-efficient solution when scaling down or to configure complex scaling conditions.

Kwiatkowski said that Shopify’s website has a large and sudden influx of requests for sales campaigns or "flash sales." Flash sales typically run for a short period, like fifteen or twenty minutes. So, they need to scale fast, but reactive scaling doesn’t work for them, mainly because scaling up includes activities like spinning up new nodes, downloading Docker images, and booting up applications like daemon sets. Thus, scaling up could take from two to twenty minutes on average. By the time autoscaling ends, adding more capacity, flash sales may have already finished.

Shopify created a custom autoscaler using Go, not open source yet, to fulfill its sudden spiky traffic. Also, they needed to have better controls to do safety deployments or configure more complex scaling conditions like using past data. The autoscaler runs every thirty seconds, and then adds the replicas that an upcoming flash sale needs.

Scaling up or down also affects the monthly cloud bill. Therefore, the autoscaler needs to make informative decisions. So, to define how many replicas the cluster needs, Shopify uses a risk versus cost analysis formula taken from the HPA in Kubernetes. Shopify defines how busy they want servers to be. Then, based on how busy servers are and how many replicas exist, the formula gives the number of desired replicas that the cluster should have. The goal is to maintain the cluster at its target utilization all the time.

Scaling down a cluster takes time, thus an increase in costs. So, to have an efficient cost scale solution, Shopify had to improve its autoscaler by analyzing past traffic data. After running some experiments, Shopify noticed that when using the average CPU utilization for setting the scaling rule, as other solutions use, they couldn’t anticipate spikes accurately. But when using the median CPU utilization, they were getting better results in spite of having extra capacity momentarily. Although, when having longer (thirty minutes) spikes, the autoscaler wasn’t adding more replicas. To solve this problem, they used an exponentially weighted average (EWA) for CPU utilization, in which new values are more significant than the older ones. Therefore, the autoscaler adds many more replicas quickly.

Shopify’s autoscaler calculates both the median and EWA CPU utilization. If there’s no significant difference, the autoscaler uses the median CPU utilization. Otherwise, it uses the EWA CPU utilization. This way, Shopify adds only the replicas they’ll need when they need it.

Finally, Kwiatkowski said that sometimes the data they collect has errors like null values, zero values, stale data, or sparse data. So, to avoid having problems when scaling up or down, if there’s any data error, they always scale to max capacity to be safe. Additionally, they configure a minimum replica value to prevent any problems when scaling down.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter