Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Safe and Fast Deploys at Planet Scale: QCon Plus Q&A

Safe and Fast Deploys at Planet Scale: QCon Plus Q&A

This item in japanese


Uber has automated the deployment of services using a hybrid cloud model. All services are deployed using the same rollout techniques and workflows, ensuring safe deployment and mitigation of any issues. Abstracting away the differences between clouds supports engineers in building services that run on any platform.

Mathias Schwarz, a software engineer at Uber, presented how Uber scaled from a small engineering team using a single datacenter to thousands of engineers who continuously deploy changes across multiple cloud platforms at QCon Plus 2020.

Every week, thousands of Uber engineers push out several thousand changes, Schwarz said. During working hours, some part of the Uber system starts upgrading every single minute. The system never runs one single version across the host fleet.

Schwarz explained how Uber does continuously deploy changes across multiple cloud platforms:

We have built our deploy systems µDeploy and its replacement Up to span several cloud providers and understand the differences so our engineers don’t have to. We structure our system into zones that are each backed by either a public cloud provider or by our own physical hardware. This allows the infrastructure team to manage our cloud usage and enable our engineers in building services that run on any cloud platform by abstracting away the differences between them.

Continuous, high frequency deploys have shaped the way that Uber engineering has worked since the early years of the company. Schwarz mentioned that the services teams have a high degree of autonomy when it comes to deploying their changes to production.

On average, Uber’s roughly 4000 services are deployed around 5000 times to production per week. Rapid deploys have enabled teams to deliver changes to their users quickly and respond rapidly to changes in the markets that Uber serves, Schwarz said.

InfoQ interviewed Mathias Schwarz about how Uber does structured deploys and how they automate deployment, how they do auto scaling, and how they automated service management.

InfoQ: How do you do structured deploys with µDeploy?

Mathias Schwarz: In µDeploy we started deploying all our services using the same rollout techniques and workflows. This meant that we could build a single way to roll out services and improve that over time to the benefit of all our roughly 4000 services. We were able to guarantee that services could roll out with a high level of safety and we built rollback functionality into the system as a fundamental feature. This meant that our engineers could trust that the system would handle their deploys for them and that they would have fast mitigation if they saw any unexpected behavior.

InfoQ: What were the things in µDeploy that you hadn’t automated and are automating now in Up?

Schwarz: It was a result of us growing. We automated certain things µDeploy: we automated rollouts, and we already had safe rollouts across zones. The things that we do now relate to also automating placement. Uber is a hybrid cloud model where we own some zones as on-prem capacity and we have some cloud capacity at Amazon and Google Cloud. One of the things we’re automating is placement and the moving of services between cloud and on-prem as well as between clouds. We are also applying auto scaling, at a large scale. We didn’t previously have that.

InfoQ: Previously did you not have auto scaling? Or was it just not as "auto" as you wanted it to be?

Schwarz: We had some auto scaling, but it wasn’t fully automated. We’re moving to a hybrid cloud model where previously we were mostly on-prem. You can more easily scale your capacity in the cloud than you can if you have to buy and manage the hardware yourself, so auto scaling becomes more important when you are in the cloud. We have started doing auto scaling based on a combination of business metrics (such as how many customers use the platform) and technical metrics (such as the current CPU usage).

InfoQ: How have you automated service management?

Schwarz: We realized that even though we had automated the daily service operations around rollout, we needed more than that to gain the full benefit of our multi-cloud system. We could relatively easily set up a new zone with a cloud provider but it would take many people many weeks to migrate a meaningful portion of our backend services into a new zone though a highly manual and laborious process. With Up, we have automated our management of zone placement for services so that we no longer need to spend engineering time on it. In addition, we have started auto scaling our services as mentioned above.

Rate this Article