How TFS Embraced 3-Week Release Cycles
Buck Hodges argues that long release cycles lead to unhealthy development practices for the Team Foundation Server team, but there is more to short release cycles than just shipping with each sprint. Other changes in how software is planned and developed need to be done in conjunction.
Originally Team Foundation Server was being released on multi-year cycles. According to Buck Hodges of Microsoft, this was leading to unhealthy behaviors for their developers. When faced with hard milestones, developers would often cram incomplete features into the product. The developers figured that it would be better to try to fix the remaining bugs during the long stabilization phase than to allow the feature be delayed by two or three years. As a result, there were times during the release cycle when TFS had over two thousand active bugs.
Another problem is the concept of Internet time. Users of online services expect frequent updates and improvements and get easily frustrated when forced to wait for bug fixes. So when planning their move to an hosted, online version of TFS, the idea of telling customers to expect updates every few years just didn’t seem feasible. (Note: The online version has just left beta and is now named Team Foundation Service.)
Short release cycles don’t necessarily mean a lack of direction. TFS developer starts with a high level storyboard representing where they want to be 18 months from now. That is then broken down into 6-month thematic plans where they concentrate on a given aspect of the application.
The TFS team has 130 members. These are divided into feature teams that represent major areas such as version control, work item tracking, automated build, etc. Each team of 12 (6 developers, 5 testers, and 1 or 2 PMs) controls its own backlog of features.
In order to keep marketing and engineering decoupled, features may be shipped to Team Foundation Service without being turned on. This allows for further testing and the ability to bundle features together for major product announcements.
For TFS 2012, Scrum was adopted as the development methodology. In their implementation of Scrum a three week cycle was chosen. They felt that shorter cycles would add proportionately more administrative overhead while longer cycles make course corrections harder and lead to the problems that faced previous versions.
Another reason for choosing Scrum was to test the software the same way it would be used. According to Microsoft, Scrum is the most popular methodology amongst its customers.
Cross-team communication is focused around emails sent at the beginning and end of each sprint. The begin of sprint emails outline the stories each team intends to work on while the completion email contains demos of the finished features.
Even though TFS went to three week sprints, releases could still be four months apart. This meant every release was very large and problematic. To reduce the size of each release they started looking at one month cycles, but that was so close to their sprint cycle they decided to make them one and the same.
Adopting this required dropping the idea of a stabilization phase where bugs are fixed. After each three-week sprint there is a week of verification. This isn’t for bug fixing; anything that doesn’t work has to be disabled or removed. The verification week does overlap the start of the next sprint,
If a feature requires the entire three week sprint, then again it must be disabled for that sprint’s release. During the next sprint the testers will verify it and if all goes well turn it on for that deployment. Shorter features that only take a week or two to implement can be tested and deployed in a single sprint.
Tests are all automated and are triggered on a rolling basis. Currently the test take 2 to 3 hours to run. Due to these long test cycles, checkins are not required to pass all of the tests. Rather, they setup TFS to only require that a checkin can be built. Below is an example of the checkin email that a developer would get.
The Service (the online version) and the Box (the version customers install) use the same code base. Furthermore, most of the features are developed against the main branch. Only significantly disruptive features are developed in separate branches. Feature branches are only merged back into the main branch when the feature is complete and they are at the beginning of a sprint.
At the end of a sprint the changes are moved from the main branch to the production and quarterly update branches. These branches inherit the version number from the aforementioned nightly build.
You can watch the full presentation on Channel 9.
Brandon Holt, Preston Briggs, Luis Ceze, Mark Oskin May 21, 2015
Kai Kreuzer, Olaf Weinmann May 21, 2015