BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Using Kanban to Turn Around Distressed Projects

Using Kanban to Turn Around Distressed Projects

This is a case study that describes how Kanban and lean development techniques were used to rescue a distressed project.

Background

This particular project was a custom development project for a large client that had been in progress for about a year.  The development team fluctuated in size from 10-15 team members.

The team started off the project using a typical waterfall development model. An analysis phase preceded the development phase. The analysis phase was supposed to take 2-3 months but ran over that time. During the analysis phase the scope began to grow beyond the initial understanding of both the client and the development team. Because the client insisted on getting the requirements "nailed down" and "signed off on" the analysis phase ended up taking about 6 months, which is roughly a 100% schedule overrun.

Despite the fact that the analysis phase ran over the originally planned time, the project end dates were not pushed out to align with that reality. As a result, the development team was pressured to deliver more-than-anticipated functionality in a less-than-anticipated time frame. The development deadlines were missed, and since testing was bolted on at the tail end and compressed, the quality of the developed solution was inadequate.

The development team operated in an interrupt-driven manner because the project manager did not manage the client to protect the development team from distractions. The typical pattern was that the client would yell at her about the quality issues and then she would just pass that through to the development team. So whatever was the crisis of the moment was what received the attention. The development team was never able to focus on a problem and drive it to completion before the next problem interrupted their work.

By the time I got involved, the relationship between the client and the development team was damaged and the client was refusing to pay. The solution was unusable by their users from functionality, reliability, and performance standpoints. The project had run over budget by hundreds of thousands of dollars. The schedule was blown by months.  Neither the client nor the development team could afford the fallout of a failed project, so it had to be turned around.

Starting to Getting Control

The first thing to do when confronted with this type of situation is don't panic.  In this particular case, the client was very agitated and had zero confidence in the team. The client was blaming the team and the team was blaming the client. Morale was poor, and it was getting worse because management was demanding that the team work harder to fix the problems. 

One of the worst things you can do is to dig your heels in and keep doing more of what got you in the situation in the first place.

Experience has shown that the best way to start removing the emotion from these types of situations is to work with the data and facts. It is important to understand the root causes of how the situation arose in order to best craft the path forward, but initially it is not helpful to present the team with all the things they should have done differently. That only inflames the situation further.

In this particular case, the main fact was that the product quality was terrible. The number of reported defects supported this. We had to get the product to a point where the users could actually do their jobs. And the only way we could start to do that was to create the environment for the development team to be able to focus on one problem at a time.

Create a Culture of Quality

It seems strange to think that despite all the advancements in software development we need to keep re-emphasizing the importance of building software that actually works correctly.  Yet this is one of the main areas where software projects continue to be challenged.

In the earlier part of the project, the development team spent months trying to document in great detail what they thought the client wanted. This led to the false sense of security that they (and the client) knew for sure what they were going to deliver in the end.  In addition, the only team members that were engaged with the client in the analysis phase were the analysts. The developers were not brought onto the team until the requirements were "done".

This led to the following phenomena:

  • The developers had to digest and interpret the requirements as part of a 200-300-page document as an atomic unit.
  • The client continued to make changes to the requirements even after the analysis document was "signed off" by both parties, which meant that the developers' work was continuously being invalidated.
  • This caused the development to push out beyond the originally planned development completion dates.
  • Testing happened at the very end after all the development was "done".  Since the original deadlines had passed, this meant that testing had to happen in a hurry.
  • Quality was less a measure of whether the developed software worked correctly and more a measure of how frequently testers and developers interpreted the requirements the same way.

The ultimate result of all this was that many hundreds of defects escaped development and made their way in front of the client.  Defects are the worst kind of waste because they are unimplemented requirements for software that has been developed already.  This means the client pays for software to be developed and then they (or the development team) have to eat the cost of making it work correctly.

Clearly the way the development team was working did not put quality first.  In fact, it put quality dead last. Since analysts, developers, and testers were matrixed onto the project to do their specific tasks they never really learned how to work as a cohesive unit that felt collective responsibility and ownership of the system quality. When problems arose they pointed the finger at one another. In short, they had failed to develop and embrace a culture of quality.

The single most important decision we made to start getting control over quality was to require acceptance tests to be developed for work items before a developer could write code. This is called Acceptance Test Driven Development (ATDD).  With ATDD, we literally put quality first.

The way we made ATDD work on this project was that we required a set of acceptance tests to be associated with each work item in the backlog.  This effectively documented the requirements that were unsatisfied by the developed code.  It also forced the tester and developer to get on the same page before coding started.  The developers' job became focused on making the failing acceptance tests pass. It is important to clarify that tests were developed on a work item by work item basis, not for a batch of work items at a time.

This approach takes the guesswork out of whether the developed code works properly or not.  The test either passes or fails.  Yes or no.  True or false.  Since each developer focused on one work item, the number of acceptance tests the developer had to understand at a point in time was never too many as compared with having to understand the entire analysis document up front.

The state transitions for an acceptance test were:

  • Failed.  The test initially fails because code has not been developed to make the test pass.
  • Developed.  The code to make the test pass has been developed.  The developer identifies this transition as they implement the code.
  • Passed.  The testers have verified that the developed code does in fact make the test pass.

Using the ATDD approach alone had the most positive transformative effect on product quality.

In the first 8 weeks of using ATDD, we transformed the project form one that had no documented test coverage to one that had a library of tests and a documented history of passing tests that asserted the overall quality of the code being developed.

Eliminate Waste and Control the Flow of Work with Constraints

As mentioned, the development team started off working in a waterfall approach.  However, things ended up breaking down into an ad-hoc approach once development began and the big-up-front-requirements document became invalidated by uncontrolled change.  From there, it didn't take long for the line from the code back to the requirements to become blurred and then eventually erased.

The profile of the work assignment was a classic push system with a very large batch size.  The development team was pushed a large requirements artifact from the analysis team.  The test team was pushed a large code base by the development team plus the very large requirements artifact from the analysis team.  During testing, when defects were uncovered, they were pushed to specific developers by a manager.  When the fixes were made, a manager pushed them to specific testers for verification.

This approach resulted in a lot of waste.  In the analysis phase, a vast amount of unimplemented requirements accumulated.  In the development phase, a vast amount of untested code was developed.  In the test phase, a vast amount of unimplemented and improperly implemented requirements were identified.

Decrease Batch Size

A large number of defects escaped development and made their way in front of the client.  The origins of this phenomenon can be traced back in large degree to the batch size that the team was working with.  The team simply did not have the capacity to move the entire batch forward as a unit in the timeframe in a way that maintained the integrity of the unit as a whole.

Trying to move such large work products forward resulted in each team becoming a bottleneck on the team that needed to perform the subsequent tasks.  In the end, it became a Sisyphean task that ended up sending the work products collapsing backwards from testing to development to analysis.  And the process kept repeating. In other words, all the work done up front getting the requirements "locked down" was complete waste because once defects emerged, the team had to keep asking themselves "now what was this supposed to do?" Even having to ask that question is wasteful.

Work as a Team

Breaking the destructive cycle and getting the team to a state where they could actually close work items and keep them closed meant changing the team structure and fundamentally altering the way that the team moved work items from Pending to Done.  In fact, it meant changing their definition of the work "done".

The fact that the team members did not share a common sense of ownership of the solution was a major impediment.  The first thing we did was to dissolve the matrix organization.  Analysts, testers, and developers were now just part of the development team.  Along with this was directly setting the expectation that it was their collective responsibility to deliver a quality product and that they all owned quality - not just the testers.

We also physically co-located all the team members in a large conference room.  This further reinforced the fact that they were a single team with shared purpose.  It also meant that now they had to actually talk to one another instead of emailing each other from one cubicle to another.

From Push to Pull

The next thing we did was to remove tasking authority from the project managers.  That is, managers could no longer push work to the team.  As mentioned, the team was operating in an interrupt-driven mode, wherein a manager would task team members in an ad-hoc manner, which resulted in a low probability that the previous task actually got completed. 

To put some structure to the development effort and seal the exits on defects escaping development, we introduced a pull system using Kanban.  The Kanban approach forced the team to work with smaller batch sizes.  Initially, this was easy because the all of the work items that needed to be completed were defects, which are usually pretty small to begin with.  It also forced us to define the word "done".

With this approach, work items (depicted as cards) made their way from left-to-right on a Kanban board through a series of states (depicted as columns) starting with Pending and ending with Accepted.  Whenever a card was ready to be worked on a team member would pull it into the next column, which meant they were working on it. Team members had to apply their particular skill at the right places to keep the cards flowing.  A work item was not "done" until it has passed through each of these states and ended up in the Accepted state.  Done = Accepted.

Each column represents work that needs to be done to move the card forward.  In order to decrease coordination costs related to communication of completed tasks, we added ready states to indicate that the previous task was completed and now the next task is ready to be performed.  For example, when the acceptance tests have been developed, the work item is ready to develop code. So the Kanban board provides a simple visual mechanism that encapsulates the process that a work item needs to go through. It also provides a way to see the status of in progress work items at a glance.

The columns on our Kanban board are listed below. Note that the first task for each work item is to define the acceptance tests as described above.

Pending

Develop Tests

Ready to Code

Ready to Stage

Ready To Accept

Accept

Accepted

In many cases a Kanban board can be drawn on a wall and the work items can be represented with sticky notes.  In our case, the team was geographically distributed, and I wanted to make sure that we were constantly relying on the captured metrics to make more informed decisions about how to keep making the process better.  We chose VersionOne as our Agile project management tool. 

One of the most valuable tools that we used was the cumulative flow chart. This chart allowed us to look at the composition of work to see how the work items were trending towards Accepted. Since we were promoting builds to the client on a weekly basis, we could track the composition of work on a weekly basis. We could also view the cumulative flow in the aggregate across many weeks to understand broader trends.

(Click on the image to enlarge it)

The above chart shows the cumulative flow of work items over the eight-week period that we were turning the project around.

Each of the stacked bars on the above chart is the aggregated view of the following charts on a weekly basis.

(Click on the image to enlarge it)

These charts are the ones we looked at on a daily basis to see if we were on track to close work items for the week. We found that having a visual mechanism like this was much more helpful for the team than simply looking at lists of defects in spreadsheets like they did previously.

Eliminate Bottlenecks

We also introduced work-in-process (WIP) limits to control the pace at which work items could work their way through the Kanban states. We observed that the analysts were not able to produce acceptance tests at a rate that kept pace with the developers. Since the developers could not keep working on new development tasks without violating the downstream WIP limits, the analysts became a bottleneck in the system. This forced the team to work together to figure out how to even out the distribution of work to ensure a consistent flow of work items. Sometimes developers had to write and verify acceptance tests (not for their own development work). We had to add more analysts and testers to the team. In some cases we adjusted the WIP limits to work out unnatural wait states.

Ultimately, the team had to start working as a real team. They had to learn how to think about how to make the work items flow through the Kanban system. This caused a big boost in morale, and the team began to own the quality of the result as a team instead of pointing the finger at each other when they were simply matrixed onto the project to perform some specialized skill and then go back to the resource pool.

Decouple Planning and Delivery Cadences

Once we solved the problem of how to move work through the Kanban system, we had to get work items into the backlog and estimated so that the team could just pull the next work item with minimal interruptions.

Previously, the development team was simply told what work items to work on and when they were to be completed.  In addition to the problems associated with the aforementioned interrupt-driven tasking model, developers never took the deadlines seriously because they had no ownership of the work item estimates.  The project manager making the commitments to the client had no real understanding of how long it would really take to complete the work items so they just told the client what they wanted to hear.  This resulted in deadlines that were rarely met.  Eventually the project manager had lost all credibility with the development team and the client.  By declaring everything an emergency, the PM ended up creating an environment where nothing really ended up getting treated with any urgency.

In Agile approaches, it is essential that the team doing the work perform the task of estimating the work it is being asked to do.  Therefore, we had to also ensure that the estimation task itself caused minimal interruption of the development tasks.

Prioritize

If everything is important, nothing is.  One of the keys to making a continuous flow, pull system like Kanban work is for everyone on the team to have a consistent understanding of what the next most important work item is.  If everyone knows what work item to pull onto the board next, the team does not need to continuously absorb the coordination cost of figuring out what to do next.

The central mechanism for managing the full list of candidate work items is the backlog.  The backlog is a prioritized list of all the potential work items that have been identified by the product owner and users.  User stories and defects are kinds of work items.  Users and other stakeholders can request a new work item at any point in time.  When they do, those requests just go into the backlog.  Addition of work items to the backlog will have no impact on the work that is currently being completed on the Kanban board.  Rather, the backlog is a holding place for requested work items.  New work item requests simply represent a commitment on the part of the development team to have a conversation about the requested change with the requester.

All work items in the backlog are prioritized relative to one another.  So the work item at the top of the backlog is the one that has been deemed the next most important thing for the team to work on.  If the product owner cares about the order in which work items need to be completed, they must play an active role to ensure that the backlog is properly prioritized.  By the way, the relative priority of work items is constantly changing in response to changing business needs.

Previously, I mentioned that we revoked the development team manager’s ability to task developers.  We re-purposed the PM role to one of working closely with the client to force them to prioritize the work in the backlog.  And yes, they had to be forced to do this since up until that point, they were accustomed to controlling what the team worked on by throwing a tantrum, which in turn exacerbated the interrupts on the development side.  This ended up being a full-time job for the manager based on the large number of backlog items and the rate at which the client kept changing their mind about what they wanted next.

Estimate

One of the rules we put in place was that a work item could not make its way from the backlog onto the Kanban board until it had been estimated.  The reason for this was so we would not compromise our ability to report on the velocity of the team, which fed into our ability to make future commitments, based on the prior performance of the team.

Every Monday morning, we held an hour-long (time-boxed) estimation session for the team to collectively estimate the highest priority backlog items that had not yet been estimated.  The team estimated as many items as they could in that time period.  The estimation units were ideal days.  The estimates accounted for all the activities on the Kanban board.  Previously, what few estimates were done only took the actual coding into account.

By having the entire team do the estimates, they all felt more vested in delivery of the items within the estimated time periods.  Since all the development activities were included in estimate it also forced them to understand more about what was involved in their teammates' roles.  It increased their cohesion as a team.

By doing the estimation session at the beginning of the work week, we could get it over with and out of the way so the team could focus on delivery for the rest of the week and not be interrupted to do estimating when they needed to be doing development.

We observed that there is a dynamical relationship between prioritization and estimation because the product owner may choose to increase or decrease the prioritization of work items based on how quickly or not they can be completed.  For example, a user story that the client thought was a high priority may end up not being as important once the client learns that they can get four of five smaller stories done in the same time-frame.

Conclusion

Projects get off track for a variety of reasons.  Projects that start off using traditional, predictive planning approaches are more susceptible to derailment.  This article has presented a high-level approach for containing projects that have become distressed.  Some of the methods may seem counter-intuitive such as embracing change instead of trying to control it.  The idea of letting the development team pull the next batch of work instead of pushing it to them may seem strange to some.

Using the approaches in this case study we were able to turn this particular project around from one that was on the brink of failure to one where the client was very happy with the quality software of they were receiving and the predictability with which it was delivered.  We started seeing positive results in the first 2-3 weeks.  After 8 weeks, the root causes of all the dysfunction on the team had been completely addressed and the team became largely self-sufficient and self-managing.

Equally important, the morale of the development team improved significantly.  After years of being beaten down by a barrage of unmanaged interrupts and complaints about the quality of their work they were finally given the chance to prove to the client and themselves that they, in fact, did know what they were doing and could produce a worthy product.

The development team now uses the Kanban approach for all of its development projects. It allows them to more effectively set client expectations up front and deliver predictably on commitments.

Kanban is not merely a project recovery tool. The best way to keep a project from needing to be bailed out is to employ these methods from the beginning.

About the Author

Steve Andrews is the founder of Fountainhead Solutions, LLC. His vision was to create a company focused on developing innovative software solutions using Agile methods and contemporary engineering techniques.

Mr. Andrews has an extensive background as a leader of solution development initiatives for almost 20 years. He is an expert in finding ways to maximize the effectiveness of teams to develop working solutions for challenging problems as quickly and cost-effectively as possible while also improving the lives of developers and users alike.

Mr. Andrews holds BS degrees in Computer Science and Mathematics from Vanderbilt University.

Rate this Article

Adoption
Style

BT