How the Bing Development Team Used Agile to Sprint Ahead
Modern software practices frequently involve some combination of the concepts agile development, continuous integration, and automated testing. The development team behind Bing recently switched to a new development model composed of these concepts that they refer to as Continuous Delivery.
Bing’s team faced the challenge of adopting a new philosophy that presented both mental and technical challenges in its adoption. The original team was composed of 100 developers that swelled to over 600 developers after 2 years. This was accompanied by a 10-fold increase in the scope of their automated software tests.
InfoQ had the opportunity to speak with Craig Miller, Technical Adviser at Bing to provide additional details on the topics raised in the development presentation:
InfoQ: What software is used to manage the testing, rollout (rollback), and deploying of new code?
Miller: We leverage both open source software (such as phantomjs for testing html pages) and Azure core infrastructure.
Testing: We run 20000 tests per checkin which results in >20million test executions a day. This is a critical component of our continuous delivery system as it ensures the code branch is ready to deploy AT ANY TIME. The system is largely hands-off.
Deployments are quite simple they are just file copies (think grep) so pushing code out is trivially simple. When code is pushed out the runtime system switches to the new deployed code by just starting up the process and sending new connections from our routers to the new process. The old process is drained of connections and shutdown. (We use asp.net/iis as our core serving infrastructure so the code is encapsulated in a mix of aspx code and dlls)
Rollback is literally just the same process but in reverse. Startup the old code (which is still sitting on the machine) and shutdown the new code. In the last 2 years we’ve done rollbacks maybe 10 times. They just don’t happen often because our testing regime and our code protection strategies (such as using experimentation to control access to new code paths) help reduce our surface area greatly.
InfoQ: How do rollouts happen? They mention six data centers, do they deploy to one at a time? How does the new code "turn on"? (since there isn't an exe to deploy... how does an end user at https://www.bing.com go from bing version XXXX to XXXX+1 ?)
Miller: We use a canary deployment strategy that pushes from smaller rings to larger ones. Simply put we’ll deploy to a couple of racks of machines (20% of a given data center) and let that run for a few minutes until the system is confident things are working well. We have hundreds of validations happening in production that look for anomalies and alert if they find one. Some of those alerts can automatically stop our automated deployments and trigger a rollback if serious enough. This is all done by software.
Once the first set of racks (we call them scale units) are looking good we then deploy to another 20% in the same DC. We do this until we get to 100% in that DC. Once that DC is complete if everything is running smooth then we push to all the remaining DCs at approximately the same time still using the 20% rollout strategy within each DC.
Users rarely ever notice a deployment since we manage traffic tightly and ensure no user is dropped while we’re deploying. Every customer counts to us so we never want someone to see the dreaded “we’re upgrading your service” message.
InfoQ: How much code is deployed "per release"?
Miller: Depends on the hour, day, time between last deployment, deadlines for teams. In general when we deploy 4 times in a single day there is about 6 hours of time where any checkin will be included in the next deployment. We have 600 engineers working in this branch that deploys continuously so typically we’ll see 100 changes go out with each release. Again if we don’t deploy for a day (because of a failure in testing that doesn’t get resolved right away) then we can see up to 400 changes in a single release.
Our goal is to make releases a non-event (which they largely have become). The system will automatically deploy when it has a good build that has passed all tests and the prior deployment has finished. Right now that sets a maximum release cadence of 4 times per day (6 hour release) due to our canary deployment strategy above.
InfoQ: When they say multiple releases per day, what defines a release?
Miller: When the code in a branch is taken and pushed to all production servers. This includes all changes made since the last deployment. We’ve had deployments with only 1 change in them because we believe in making releases cheap/safe. We deploy at any time if the system is ready. We have such confidence in our testing and deployment system that no one bats an eye when we deploy 8 times over a weekend and not a single person was called to look at an issue. We believe this is the way it should be and are annoyed when it doesn’t happen.
InfoQ: How are things rolled back in the event a test fails if multiple developers are submitting code for deployment?
Miller: Every single change is tested PRIOR to being included in the build that will be deployed. We never deploy untested code, period. If a test fails that checkin is blocked. ...And, with the way we deploy, this means that if I have something broken in my checkin only my code is rejected.
InfoQ: What defines “business hours” for Bing?
Miller: We don’t have any. Neither do our customers. We’re a global service and as they say “it’s 5 o’clock somewhere”. We’re always on.
InfoQ: Are code proposals and deployment to test servers combined with other developers, if so who has responsibility for failure? Or is each patch segmented from others?
Miller: Every checkin is isolated from others when testing is executed. Your code must withstand the tests in isolation. If it passes 100% (we allow NO test failures) then your code is in and will be deployed automatically at the next opportunity.
We also have combined the concept of “test servers” and “production servers” – we don’t have test servers we only have production servers. However we only send test traffic to a subset of the production servers and no customer traffic. This means we don’t maintain test servers independently we manage them just like prod servers we just isolate their traffic.
InfoQ: Clearly the introduction of continuous delivery has had significant positive results for the Bing team. But in the beginning, success is not assured in general people are resistant to change. How was the original idea of continuous delivery introduced to the team?
Miller: Myself, my manager and a peer of mine kicked off the concept of targeting continuous delivery and lowering the barrier of getting code to customers' hands. We challenged ourselves and the team to improve our deployment systems by an order of magnitude. We knew it would be hard to do, and that there may be some discomfort in the organization, but we also know that the improvements we made in the end would be worth it.
Now, engineers on our team are fully on board. The engineers get to write code and deploy it to customers very quickly, which helps them feel connected to the customer. An added bonus is that work life balance has improved dramatically, meaning we have happier engineers who are delivering more value to customers than ever before.
InfoQ: How did this introduction go? Were their any challenges?
Miller: To implement continuous delivery across the organization, there were some cultural obstacles that we had to overcome. Primarily, we had to overcome people’s deep knowledge about how to write software. When you have 15 years of experience at something many habits are very hard to break especially when you have little counter-evidence around you. Additionally, there are a limited set of examples to look towards (at least 4 years ago) that would tell you continuous delivery was going to work. Lastly, human nature is to be risk-averse and favor evolution over revolution – Continuous Delivery is very much a revolutionary course that will alter the way you create software, manage your quality, and deliver ideas to your users. It’s a journey worth taking for sure.
InfoQ: How would you characterize the use of Continuous Delivery today at Bing?
Miller: At Bing, we have a lot of things in our favor that help us make the change to continuous delivery.
- Our competition challenges us every day, and that helps us be data-driven and focused on metrics that help us understand our true north for customers.
- We have faith in our engineers to make good decisions, and we organize ourselves in a way to encourage these decisions. We brought ownership of agility to the entire organization so that everyone was invested and responsible for it.
- Our team has a history of creativity and experimentation. Once we improved code velocity, this lowered the barrier to creativity even more and encouraged great ideas that could directly benefit customers.
- We don’t ship a feature if it fails one of our 20,000 automated tests. This removes debates about shipping buggy code. If a test fails, that check-in is blocked, and the feature doesn’t ship.
- Support from our senior executives was also very helpful. They could see the promise of continuous delivery and helped us spread the message of it across the organization.
Thanks again to Craig Miller and the Bing team for agreeing to this interview.
About the Interviewee
Craig Miller, Technical Advisor at Bing, has been with Microsoft for 18 years. He’s passionate about working with engineers to produce great products and services that serve millions of people every day.
Statement contradiction - No test server, each code tested sepererately