Kanban at Scale – A Siemens Success Story
This article tells the story of our experience implementing Kanban at Siemens Health Services (HS). It describes an internally driven and remarkably smooth implementation approach which very quickly rewarded Siemens with real and sustainable improvements in predictability, efficiency and quality.
I will explore the challenges that precipitated the change, 6 years after adopting Agile and Scrum, and show why Siemens chose adoption of the Kanban method to address these challenges. While conventional Scrum/XP provides many great practices, Siemens continued to encounter key gaps in our ability to achieve predictable outcomes, and momentum towards continued improvement in operational efficiency and quality were faltering. Furthermore metrics based on relative story points and velocity charts were not providing sufficient insight to manage release development at this scale. In developing large scale enterprise systems in a complex and highly regulated domain such as healthcare, providing and meeting accurate release forecasts are essential customer and market imperatives.
In this article I will show you how Kanban overcame these challenges, improving and reinvigorating Siemens’ agile adoption. I will demonstrate the benefits of “flow” and its advantages in terms of actionable metrics and forecasting capabilities based on real data captured from recent releases. Our experience proves that a lean and systemic approach using continuous flow (rather than time-boxing), visualization, limiting work in progress, and investing heavily in associated metrics, can bring about fundamental and sustained improvement without compromising an Agile approach to software development. Furthermore, by understanding flow and the metrics of Flow allowed Siemens to take specific action in order improve overall predictability and process performance.
We used a big-bang approach to implementation, based on the belief that in order to achieve results at this scale; we would need to fully deploy Kanban across all the teams as well as at the enterprise program management level. Furthermore, evolutionary continuous improvement would occur only once cycle times were stabilized and this required full implementation. One could say that our approach required revolutionary change first; in order to enable ongoing continuous evolutionary improvements to issues and systemic problems that the Kanban method would expose.
Siemens Health Services (HS), the health IT business unit of Siemens Healthcare, is a global provider of enterprise healthcare information technology solutions. Our customers are hospitals and large physician group practices. We also provide related services such as software installation, hosting, integration, and business process outsourcing.
The development organization for Siemens HS, known as Product Lifecycle Management (PLM), consists of approximately 50 teams based primarily in Malvern, Pennsylvania, with sizable development resources located in India and Europe. In 2003 the company undertook a highly ambitious initiative to develop Soarian®, a brand new suite of healthcare enterprise solutions.
Our key business challenge is to rapidly develop functionality to compete against mature systems already in the market. Our systems provide new capabilities based on new technology that helps us to leapfrog the competition. In this vein, we adopted agile development methodology. We adopted Agile in 2005, and more specifically Scrum/XP practices as the key vehicles to achieve this goal.
As a development organization we enthusiastically embraced Agile, but under no illusions: we understood from the outset that this was a journey. We engaged many of the most well-known experts and coaches in the Agile community and adopted an accelerated though empirical and evolutionary approach to learning and absorbing new practices. We were happy to see rapid improvements early on and continued to make ongoing incremental improvements; by 2011, having adopted most Scrum and XP practices, we became a mature Agile development program providing new feature releases approximately once a year. We had built mature scrum teams and roles, developed a mature product backlog process, and ran 30 day sprints with formal sprint planning, reviews, and retrospectives. Practices such as CI, TDD, story driven development, continuous customer interaction, pair programming and relative point based estimation were steadily integrated and largely became standard practices across all our teams.
Our Traditional Metrics Failed Us
In this section I will describe the problems we had been experiencing over the years with conventional Scrum/XP and how we shifted our paradigm to systems and lean thinking which led us to Kanban.
Despite our best efforts, something was wrong. We were continually challenged to meet committed release dates. Release deadlines were characterized by intense pressure with large amounts of overtime. In the trenches, our teams experienced difficulties planning and completing stories in time-boxed sprint increments. The last week of each sprint was always a mad rush by teams to claim as many points as possible, resulting in hasty and over-burdened testing. Story “doneness” required rigorous testing by a Test Engineer in a fully integrated environment.
While velocity charts based on relative story points at sprint reviews often seemed good, they were not always giving us an accurate picture. Team burn-up/down charts were not providing the transparency needed to clearly gauge completion dates or provide actionable insight into the state of our teams. Managing a development program at this scale requires tremendous organizational cohesion, dependency management, and continuous enterprise integration. Our traditional agile metrics were simply not sufficient for us to manage software product development at this scale and complexity. The teams were starting too many features which were not reaching completion until the end of the release and we were not seeing a clear picture of where our systemic bottlenecks were. During that period we operated under the assumption that if we mastered Agile practices, planned better, and worked harder, we would be successful. At Siemens, these challenges are significant. Regulatory and customer commitments require a high degree of certainty and predictability. Corporate decision checkpoints and quality gates demand firm commitment and, revenue tied closely to budget, schedule, and, scope, all of which are fixed.
In November 2011 executive management chartered a small team of director level management from across the Product Life-Cycle Management (PLM) organization to coordinate and drive process improvements focusing on improving predictability, quality, and efficiency across the PLM organization.
Based on the writings of Peter Senge, Russel Ackoff, Eliyahu Goldratt and Donald Reinertsen, we turned in a different direction of Systems Thinking, Lean Thinking and Queues.1 Firstly, we realized that many of our process improvements were not bringing about systemic improvements because we were addressing specific practices within each functional domain (analysis, coding, testing, etc.) rather than across the whole system or value stream; in short we were sub-optimizing. Secondly, we had been ignoring lean principles such as the impact of large batch sizes, queues, and resource utilization. Understanding that the overtime, for which programmers were sacrificing their weekends, may actually be elongating the release completion date was an epiphany. Our problem lay not in our people or skills, but in the amount of work in progress (WIP), or in “throughput management” as we described it. We realized that we were in a state of continuous product development; therefore, in spite of the obvious differences between manufacturing and software engineering, many of the assumptions related to manufacturing and flow could and should apply to product development. This also opened the possibility that many our planning assumptions based on project management disciplines (rather than product development) would need to be reevaluated.
Inevitably, this path led us to the Kanban method which provided a recipe for what we needed to do. The visualization would provide insight into systemic problems and patterns across the value stream and act as a catalyst for continuous improvements. The metrics and charts afforded by Kanban provide even further transparency and are actionable. By transparency, we mean that the metrics provide a high degree of visibility into the team’s and/or program’s progress; by actionable, we mean that the metrics themselves will suggest the specific team interventions needed to improve the overall performance of the process. Furthermore, measures such as “cycle time” and “throughput” are tangible-- and unlike relative story points and velocity, are familiar and understandable within the general corporate culture.
By December of 2011, we had gained support and funding from executive management for a Kanban initiative.
Process Improvement from the trenches
In the following section, I will describe Siemens’ approach to process improvement. It will describe the grass-roots character of the process improvement team and Siemens’ belief in relying largely (with some exception) on internal experts to lead change. I will also explain why Siemens: chose a big-bang approach using Ackoff’s Idealized Design2 change method; invested heavily in metrics; made key implementation decisions related to Agile practices, workflow, policy, quality, training and use of internal coaches.
A Kanban design and implementation team known as the “Flow Team” was created with the objective of designing the policies and practices and managing the rollout and adoption of the Kanban method. The team included managers, developers, scrum masters, analysts, testers, and release managers from all corners of the PLM organization. Our past experience with improvement initiatives had taught us that any process improvement team must be comprised of folks drawn directly from grass root levels who were representative of the teams and who would be actively implementing and operationalizing the change initiative in the trenches.
At Siemens we had also learned through experience that in order to make a new process sustainable, it needs to be led internally and requires a high level of internal expertise and competency in the process. Our reliance on outside experts had to be limited. Therefore, working together, the Flow Team educated themselves, designed the Kanban adoption, and drove the implementation. During operationalization of Kanban, the Flow team morphed from a design into an operational team, working closely with the program core team in coordinating and managing release development as we rolled out the new Kanban method and metrics. The program core team could be thought of as senior managers responsible for managing releases at the enterprise level.
At the same time, members of the Flow team acting as Kanban coaches deployed across all our Scrum teams to help assimilate and manage the change. The team understood that some of the key concepts such as limiting work in progress, pull vs. push and resource utilization could seem non-intuitive and ran counter to our current practice. Assimilating these concepts and adopting a “lean” mindset would take time: formal education and training activities would not be enough, so we looked to coaching as key to successful adoption among the teams and line managers. To that end, the team reached out to external consultants from the Agile coaching community for help.3 Their training and guidance helped us enormously as the use of coaching played a vital role in the success of the implementation.
Revolution before Evolution
In somewhat of a departure from established Kanban dogma, we approached the initiative not as an evolutionary continuous improvement effort but as a holistic redesign. Borrowing from Russell Ackoff’s principles of “Idealized Design,” our focus was on transforming the whole system rather than improving parts of the system. We would act as if we were free to replace the existing system with whatever system we wanted and design a system to replace the existing system right now. We would start from the results we wanted and work backwards.
We would ensure participation of everyone who would be affected, and ownership of the resulting plan would be spread widely by those who had a hand in preparing it. We would avoid detailed discussion of the constraints within the current system; only once the new system was designed would we discuss, resolve and/or dissolve any current barriers and constraints. The Flow team believed that if a paradigm shift was needed a more revolutionary rather than evolutionary approach would be more effective in achieving results
Continuous and evolutionary improvement would occur only once the redesign had been implemented. Once the new system was in place, Kanban would act as a catalyst for process improvement through visualization of the work-units and the associated metrics, which would identify waste, variability and, bottle-necks. However, this type of continuous improvement could only occur once we achieved much higher levels of predictability, and this could only happen once the Kanban method including WIP limits had been fully implemented
Furthermore based on our previous experiences in rolling out changes at this scale, a more evolutionary or piecemeal approach would not suffice. Our development programs generally consist of 15 to 20 Scrum Teams, where each scrum team focuses on a specific business domain; however, the application itself requires integrating all these domains into a single unitary customer solution. At this scale of systemic complexity, dependency management, and continuous integration, a very high degree of consistency and cohesion across the whole program is required. If we were to show results at this scale, we would need to fully deploy Kanban across all the teams with consistent practices and policies applied across the whole program. This effort would not only entail practice changes within our scrum teams, but also require operationalizing and aligning management at the enterprise level to Kanban practices, new metrics, and terminology.
Enterprise versus Team level
Early on in our discussions we had debated whether we could implement Kanban at the enterprise level without changing anything at the team level. In this way the enterprise program Core Team would control and limit the amount of work (specifically features) going into the team. This could make the adoption easier, by not having to make any real change in team practices. Ultimately, the Flow Team decided that full adoption by the teams was necessary in order to achieve results. We would install Kanban boards in every team. The Flow Team believed that we also required an enterprise Kanban board, which would provide visibility to the many cross-team integration tasks and dependencies; as well as to enterprise level activities such as performance and scalability testing. However based on our time constraints we postponed this activity, as a future enhancement.
With this in mind, the Flow team designed and implemented a big-bang approach with a set of “explicit policies” to which all teams adhered and which provided a very high degree of work-unit, workflow, doneness and metric consistency. The team also concluded that electronic boards were needed on large monitors displayed in each team room that would be accessible in real-time to all our local and offshore developers. An electronic board would also provide an enterprise management view across the program and a mechanism for real-time metric collection for each team and across the program.
As our work-units, we kept our current Feature, Epic and Story hierarchy with cycle time defined to be the amount of total elapsed time needed for a work item to get from “Specifying Active” to “Done.” Throughput was the number of work items that entered the “Done” step per unit of time (e.g. user stories per week). Note that throughput is subtly different than velocity. With velocity, a team measures story points per iteration. With throughput, a team simply counts the number of work items completed per unit of time. That unit of time could be days, weeks, and months.
(Click on the image to enlarge it)
Picture of Team Kanban Board
Emphasis on Quality
As part of the implementation, the Flow Team put emphasis on reinforcing and improving our CI and quality management practices. Each column had its own doneness criteria and by incorporating “doneness procedures” into our explicit policies, we were able to ensure that all quality steps were followed before moving a story to the next column – for example moving a story from “specifying” to “developing.” Most of these practices predated Kanban; however, the Kanban method provided more visibility, focus, and rigor for our test management practices. We also improved our defect management by displaying all defects on the board as a separate work-unit. The extent of the improvement in defect rates was one of our most pleasant surprises. By managing queues, limiting work-in progress and batch sizes and building a cadence through a pull versus push system we were able to expose more defects and execute more timely resolutions. On the other-hand “pushing” a large batch of requirements and/or starting too many requirements will delay discovery of defects and other issues; as defects are hidden in incomplete requirements and code. Figure 1 below compares the Kanban release to the release prior to Kanban and shows how Kanban both reduced the number of defects created during release development and minimized the gap between defects created and defects resolved during the release.
(Click on the image to enlarge it)
Figure 1: Quality compared between releases.
Kanban Augments Agile Practices
Some of the Flow Team’s most difficult and emotionally charged discussions involved changes to our current practices, many of which we ourselves had evangelized and implemented over the years as part of our Agile evolution and had become engrained in our vernacular and culture. The system we envisaged continued to remain Agile in terms of all established Agile software engineering practices. We believe all of these are superior practices and we had built up strong competencies which we saw no reason to change.
As a result, the implementation Kanban caused very little disruption to the teams and organization in terms of roles or how they performed backlog management, analysis, coding and testing. Kanban augmented these practices by enforcing a lean methodology.
Practices related to time-boxed iterations were discarded in favor of continuous flow. Teams were asked to “stop starting and start finishing.” In lieu of sprint planning and sprinting, backlog management and story breakdown became continuous activities and teams pulled in new stories based on priority, their capacity regulated by WIP limits. The teams gradually discarded story points and focused on cycle-time and throughput.
As our “pilot,” we chose Soarian Financials®, the Siemens enterprise revenue cycle product development program that comprises approximately 500 employees across 15 scrum teams located in Malvern, Pennsylvania, Brooklyn, NY, and Kolkata, India. The next release would begin in May 2012. This timing and the big-bang implementation approach pushed the Flow Team to provide two days of training to the whole organization in two sittings held in Malvern, PA, and Kolkata respectively. The classes were held in large auditoriums where each Scrum Team was seated at their own team tables with their team Kanban coach. The training occurred in April 2012, a few weeks prior to the release kickoff.
Scrum Teams playing the “getkanban game” simulation during the training
In terms of providing the training, we decided to seek outside help. As luck would have it, we met an expert at a local event who had run the world’s first Kanban project at Corbis and was one of the founders of the Kanban methodology.4 Along with providing the training, this consultant continued to provide expert insight and advice on our design, mentored our coaches, deepened our understanding and most importantly opened our eyes to Little’s Law (which I will discuss further on in this article) and the metrics and conditioning of flow. It should be noted that although as mentioned previously, Siemens believes strongly in building internal experts, we could not have been nearly as successful without the help of this outside expertise.
Kanban Metrics provide transparency and are actionable
As mentioned earlier, metrics based on relative story points and velocity charts such as burn-ups/downs were not providing sufficient information to manage development at this scale. We replaced points and velocity with new measures such as throughput and cycle time and began using new charts such as cumulative flow diagrams and scatterplots. It was somewhat surprising that none of the available enterprise scrum tools provided the metrics and graphics in a format we needed. Therefore, the Flow Team made the decision to invest heavily in extracting the data from the scrum tool and deriving the metrics and producing these charts ourselves. At the team level the metrics would be used to manage WIP, see real and potential variability and enable teams to make adjustments; likewise at the enterprise level, managers would see capacity and other areas of systemic variability impacting the release. For example, using the cumulative flow diagram the management team was able to see higher throughput in developing versus testing across all teams and thus make a decision to increase test automation exponentially to re-balance capacity. Our ability to capture and use these metrics paid huge dividends.
The metrics and charts afforded by Kanban provide transparency and are actionable. By transparency we mean that the metrics provide a high degree of visibility into the team’s and/or program’s progress, and by actionable, we mean that the metrics themselves will suggest the specific team interventions needed to improve the overall performance of the process. Furthermore measures such as “cycle time” and “throughput” are tangible-- and unlike relative story points and velocity, are familiar and understandable within the general corporate culture.
By using a scatter-plot and histograms, we could see cycle time trends at different percentiles, detect variability quickly, and take action. This level of transparency provides very quick insight into systemic variability and in a very real sense forces continuous improvement. We could now measure operational (or flow) efficiency by calculating the ratio of touch time/cycle time.
Through the cumulative flow diagram,5 - (see citation below for more information on CFDs); we could track our throughput or run-rate(the number of stories we were completing per day/week etc). We had transparency into where we had systemic capacity imbalances and more than anything, how work-in-progress impacts cycle time and throughput across all functional roles.
We now understood that WIP, cycle time, and throughput are inextricably linked through a simple yet powerful relationship known as Little’s Law.6
That is to say, change in any one of these metrics will almost certainly cause a change in one or both of the others. We could lower cycle time by decreasing WIP while holding throughput constant. We could increase throughput by decreasing cycle time and holding WIP constant. If any one of these metrics is not where we want it to be, then this relationship tells us exactly what lever(s) to pull in order to correct it. In short, Little’s Law is what makes the metrics of flow actionable.
In May of 2012, Soarian Financials® kicked off the first Siemens release development using Kanban. The release ran for six “sprints” (although we were no longer technically “sprinting”; we retained monthly retrospectives as well as team and program reviews), with code freeze in October 2012. The decisions and actions of the flow team paid off. The adoption and implementation by the teams and the overall program was remarkably and unexpectedly smooth and enabled Siemens to reap excellent results. I will provide more about results in the next section where I will discuss the results and learnings from the adoption.
Results from Kanban
In this section I will review the results that Siemens achieved. I will discuss the positive reception Kanban received from the teams and dive deeper into the metrics. Using the scatterplot and cumulative flow diagrams I will illustrate why WIP limits really matter and show the very significant reduction in cycle-times and improvements in predictability and throughput that Siemens achieved. I will also describe how Siemens was able to perform release and feature forecasts using historical cycle time metrics instead of estimation.
Folks on the Teams Responded Positively
The most tangible early indicator of success was the improved engagement of the teams. Within the first month, scrum-masters reported more meaningful stand-ups. Instead of the ritual that many of our stand-ups had often seemed, visualization of the work was creating a more collaborative environment. This sentiment was especially expressed and emphasized by our offshore colleagues, who now felt a much higher sense of inclusion during the stand-up. Having the same board and visualization in front of everyone made a huge difference on those long distant conference calls between colleagues in diametrically opposed time zones.
While there was still some skepticism at the outset, as one would expect, overall comments from the teams were positive, people liked it. This was confirmed in an anonymous employee survey performed close to the end of the release. This was an engagement survey to employees regarding several process, people, and infrastructure change initiatives that had occurred over the previous year, Kanban being one, albeit a major one. Employees were specifically asked to rate these initiative and to provide free form comments. Results for Kanban were very positive, with many enthusiastic comments in the free form section. Below are samples of the free form comments that employees expressed in the survey.
Sample of positive comments provided by employees in an engagement survey conducted close to the end of our first release using Kanban
WIP Limits Really Matter
Prior to implementing Kanban our cycle times had varied a great deal. As the scatter-plot figure 2 below demonstrates, whereas at the 50th percentile our cycle times were 21 days or less, at the 85th percentile, our cycle times were at 71 days or less. Our teams were not paying attention to WIP and as a result our story cycle times were unpredictable.
(Click on the image to enlarge it)
Figure 2: Cycle Times in the release before Kanban
The Flow Team had decided to delay making WIP limits mandatory until the third month of development. In some ways we thought of WIP limits as more of an art than a science and decided on trial and error as a means for each team to figure out their own WIP limits. This delay cost us, as revealed in Figure 3; without WIP limits, our cycle times demonstrated the same upward pattern that we saw in our previous releases.
(Click on the image to enlarge it)
Figure 3: Scatterplot early on in the first release with Kanban
This could also be seen clearly in our cumulative flow diagram. See figure 4 below
(Click on the image to enlarge it)
Figure 4: CFD Early on in the first release with Kanban
By using cumulative flow diagrams, we could clearly see how managing work-in-progress and aligning story arrival rate with departure rate impacts cycle time and predictability. Figure 4 above shows how the teams were bringing in more stories (arrivals) than they were completing (departures). Cumulative flow diagrams provided a full picture at the individual team and program levels where our capacity weaknesses lay and revealed where we needed to make adjustments to improve throughput and efficiency. We could also use average throughput (story completion or departure rate per day or week) in order to forecast completion dates based on the number of stories remaining in the backlog.
However once the teams adopted WIP limits, cycle-time stabilized. As figure 5 demonstrates, cycle time at the 85th percentile fell to 43 days or less, and overall cycle time distributions were trending in much more predictable ranges. This represents an improvement of over 40%.
Figure 5: Stabilized cycle times after introducing WIP Limits
Similarly, we can see on the cumulative flow diagram below, see figure 6 how the system stabilized after WIP limits were implemented.
(Click on the image to enlarge it)
Figure 7: CFD in the first release with Kanban after WIP limits were introduced
In our second release using Kanban, the metrics demonstrated that we had achieved a high degree of cycle time predictability. Cycle times demonstrated similar distributions and ranges as our first release but with improvements. At the 85th percentile, stories were finishing within 41 days and variability was still better controlled. Viewing the two scatterplots figure 6 side by side bears this out:
(Click on the image to enlarge it)
Figure 6: Scatterplots of the first release using Kanban (above) and the second release of Kanban (below)
Reducing Cycle Times Increases Throughput
As figure 8 below shows, an improvement in cycle time resulted in an increase in throughput (or story completion rate). Comparing our first release using Kanban to our second, we reduced median cycle time by 21.05% which resulted in a 33% increase in throughput and/or number of stories completed in the same number of days across both releases, with no changes in story size or team composition.
* Measured over 149 days across each release
Figure 8: Further reduction in cycle-time in the second release of Kanban resulted in significant throughput improvement
Better Release Forecasts using Cycle Time
The fact that we had achieved predictable cycle time enabled us to provide much better release date predictions. We were introduced to a Monte-Carlo simulation modelling tool which provides a distribution of likely feature and/or release completion dates.7 Since we had already collected the metrics, the act of updating and using the tool was fairly simple. We piloted the tool in a couple of teams during our first release with Kanban and were extremely impressed with its accuracy. Therefore, by the time we started release planning for our second release, instead of days of planning games and story point estimation, the Scrum-Masters used the simulation tool fed by their team’s own historical metrics. Throughout the release, they continued to update the tool periodically to provide updated metrics, which in turn further improved accuracy. These results proved that our accuracy in making feature and release forecasts had been enhanced enormously.
At the same time, as we analyzed our cycle-time metrics shown in Figure 9, there was minimal correlation between estimated story points and actual cycle time.
Figure 9: Sample from actual team data comparing story point estimates to actual story cycle times.
Back in November 2011, nobody at Siemens HS had heard of Kanban, but results after the first year convinced management to migrate all product lines-- around 40+ teams across three continents-- to Kanban.
Our experience convincingly proves that a lean and systemic approach using continuous flow (rather than time-boxing), visualization, limiting work in progress, and investing heavily in associated metrics, would bring about fundamental and sustained improvement without compromising an Agile approach to software development.
Understanding flow—and more importantly the metrics of flow—allowed Siemens to take specific action in order improve overall predictability and process performance. On this note, the biggest learning was the understanding that predictability was a systemic behavior that one has to manage by understanding and acting in accordance with the assumptions of Little’s law and the impacts of resource utilization.
Our experience also convinced us that our problems lay not in our people or skills, but in the amount of work in progress (WIP). We realized that we were in a state of continuous product development; therefore, in spite of the obvious differences between manufacturing and software engineering, many of the assumptions related to manufacturing and flow could and should apply to product development.
Furthermore, the experience confirmed our belief that predictability has to be achieved as a pre-requisite for ongoing process improvement. It is only once predictable and stable cycle times are achieved by limiting WIP, which aligns demand and capacity; are you able to see the levers to address bottle-necks and other unintended variability. Continuous improvement in a system that is unstable always runs the risk of improvement initiatives that result in sub-optimizations.
Finally, it is very important to note that Agile practices such as small incremental story driven development, TDD, CI, etc., and self-organized cross functional Scrum teams enabled the success of Kanban. It is highly unlikely that Kanban itself would have led to great results without these practices in place. By adopting continuous flow, we reinforced and augmented our agile adoption.
About the Author
Bennet Vallet has been at Siemens Health Services for over 20 years developing large scale healthcare information systems. Since 2005 Bennet has been involved in leading change and transformation initiatives including the adoption of Agile. In recent years Bennet has been leading the Siemens Health Services adoption of Kanban.
1 Peter Senge, The Fifth Discipline: The Art & Practice of The Learning Organization(New York: Random House LLC, revised edition, Mar 31, 2010); Russel L. Ackoff, Differences That Make a Difference (Devon, U.K.: Triarchy Press, 2010); Eliyahu Goldratt, The Goal – A Process of Ongoing Improvement ( Mass: North River Press, revised edition 2004); Donald G. Reinertsen – The Principles of Product Development FLOW (Ontario: Celeritas Publishing, 2009).
2 Russel L. Ackoff, Jason Magidson and Herbert J. Addison, Idealized Design (N.J.: Prentice Hall, 2006).
5 Donald Reinertsen- Flow Managing Queues chapter 3.
6 Little’s Law.
7 For more information on the tool and Monte Carlo simulation see Troy Magennis.
Kanban at Scale
Martien van Steenbergen
Thank you very much for writing it up and publishing it, Bennet.
Question, as you already read Donald Reinertsen's book about Product Development Flow, did you look into SAFe?
best article i saw since years - congratulations!
I learned about this FLOW/Throughput stuff in 2005 by reading the books of E. Goldratt. At this time we improved IT project organizations with critical chain management with huge success. And it was clear that agile (XP, Scrum) will be another break through. But this first generation agile showed problems in throughput, flexibility and reliability.
End of 2012 :-) I had the chance to lead an agile team for a big ISP (1&1). We took a slightly other way – we concentrated on the reliability by adding elements from critical chain to scrum - mainly the progress-buffer-consumption-diagram (critical chain fever curve). In this moment the sprints were not necessary anymore and we started to focus on FLOW. We skipped Kanban and went directly to the Drum-Buffer-Rope – on small step further than Kanban ;-)
And exactly as described in the article, all the positive effects occurred. We reduced the WIP to the minimum. Our golden rule is “less open tasks than developers!”
Within weeks CI was established and the productivity (throughput) rose drastically (sorry I have no data on that – because the size of the stories changed massively. They got smaller – what’s good by the way). What I can tell is that within 1.5 years and 5 releases – just two bugs slipped in production. And all 5 releases were in time and in scope. The most interesting effect was the increased agility. Because of the minimal WIP it was very easy to repriorize the stories.
I don’t want to make advertisement – but you can see the product. Within four weeks the team builds up a preregistration database for the new-top-level-domains – this has to be high performance and reliable. And within the two month after that, they build all the high level provision processes including all the conversion stuff around. You can have a look at it yourself: www.1and1.com/new-top-level-domains
So as I said ****** (six stars) – excellent article – congratulations
Wolfram Müller (Speed4Projects.net and Manager at 1&1)
p.s.: some hints/informations – if you are interested in more. You can find more about this Critical Chain + agile blend under the label “reliable scrum”. The Drum-Buffer-Rope adaption is named “ultimate scrum”. All is written in a very fresh book I wrote together with Steve Tendon called “Tame the Flow” (published under leanpub). Or even more - some ideas how projects fit together with agile - www.youtube.com/watch?v=9SQbhAKq5_M
Re: Great article
Thank you for your comment. The answer to your question regarding SAFe is no and at the time we had not heard of SAFe. I have looked at it recently, though at a very cursory level and I am not really seeing how it would provide more benefits. However, I am open to be educated on this topic.
Re: Excellent post!
1. Predictability in terms of meeting hard deadlines with fixed scope
2. Our ability to complete stories in sprint increments
3. Metrics based on points and velocity not being adequate to meet our program needs at this scale
It's interesting to see that story size and cycle time do not have any correlation.
Wonderful case study
Regarding the point about evolution vs revolution as the starting point I see the journey Bennet describes as quite an evolutionary journey. They started with how they were working and implemented core Kanban practices including WIP limits at the teams level.
They didn't change roles or processes. So I'm not sure why there's some discussion about this case study going in the face of the classic Kanban evolutionary change method.
I have it say that I personally started in an even more evolutionary fashion in some cases with mixed results. I suspect there's some minimal level of change that is required to get traction.
The discussion about whether to insist on limiting WIP initially is certainly an interesting one :-)
and I'm starting to think that even the decision to move to using Kanban boards (instead of excel/gantt based schedules ) is a very dramatic change that doesn't fit the evolutionary change approach either. It's not an issue for organizations already using some sort of iterative delivery planning/tracking approach. But for the heavy push based schedules organiZations this is painful. More at LKNA14...
Thank you very much for sharing
Thanks tons for sharing. I agree with the others who said that we need more case studies like this, but I am grateful for any that I can find.
More case studies on Kanban
There are several other case studies and interviews with companies that have applied Kanban published recently on InfoQ:
- 3 years of Kanban at Sandvik IT: The Story of an Improvement Journey
- Improving Product Development with Flow Thinking with experiences from Ericsson
- Experiences from Applying Kanban at SAP
InfoQ likes to publish more case studies, please contact us if you want your story to be published.
Probably one of the best article ever written about Kanban
Congratulations and hope for more articles like this.
Very thorough and well written case study
Story size, correlations
One question - you say that you achieved "33% increase in throughput and/or number of stories completed in the same number of days across both releases, with no changes in story size".
How do you know that the story size was consistent (how were stories created/sized)?
You also say there was "minimal correlation between estimated story points and actual cycle time." - could you state the correlation coefficient, please? The correlation doesn't look too bad, to me, given that cycle time and story points aren't measuring quite the same thing anyway.
Deployment Automation 101IBM