BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Using Blocker Clustering, Defect Clustering, and Prioritization for Process Improvement

Using Blocker Clustering, Defect Clustering, and Prioritization for Process Improvement

Bookmarks

Teams use kanban boards to visualize work and track progress during the development process. When work gets delayed (it’s blocked), it is of particular interest to look for ways to decrease cycle time and improve the smooth flow of work by resolving the causes of that delay. A common technique used by teams is to add a red Post-it note with a brief reason to the blocked work ticket.

Once the block (also called a “blocker”) has been resolved, teams typically discard the red Post-it notes – but a team can use these notes to find ways to improve the system and to decide which issues to investigate and address to provide the biggest bang for the buck.

Figure 1 - An all-too-common team conundrum.

Figure 1 shows a common team conundrum. All items in the develop column are blocked, and that column is at its WIP limit (the maximum number of items allowed in this column at any time). The team is faced with a choice: increase the WIP limit for that column or solve the problems blocking the work. It is often easier to increase the WIP limit (one swipe of a whiteboard marker), but that risks increasing the end-to-end process cycle time (cycle time of work through the whole system). Little’s law is often used to demonstrate that an increase of work items in progress usually lengthens system cycle time. If we want to improve flow, getting the blocked cards moving is the better choice. In the long term, finding ways to eliminate the root causes of these delays is a superior solution, as long as it’s both possible and economical. 

This article discusses clustering blockers and provides ways to prioritize those blockers that have the most impact or are the quickest wins.

Blocker clustering

The key to performing blocker analysis is to capture the cases of impediments on the kanban or Scrum boards. Figure 2 shows blocker Post-it notes adorning work that is currently stalled (circled). After you resolve these blockers, DON’T THROW THE BLOCKER POST-ITS AWAY!

(Click on the image to enlarge it)

Figure 2 – A Kanban wall. Each ticket represents a piece of work in progress or on the backlog. Tickets with a red (or pink) Post-it (circled) are currently blocked.

Gather the Post-it notes of resolved blockers on a separate board. Make sure each blocker has enough text to remind the team of the cause of the blocker – for example, “Text and photos for reservation page missing” as shown in Figure 3.

Capturing the blocked time is important for prioritization of the most impactful blockers. The easiest method for the team is to add a tick mark during each day’s stand-up to count total days blocked. Some teams record the start date on the bottom left of the Post-it and the resolved date on the bottom right. Figure 3 shows how days blocked can be recorded by marks.

Figure 3 – Capture the durations of your blockers.

The next step for the team is to cluster the blockers by internal and external causes and then group them by some form of commonality. It’s important to tally the total days blocked in each group you’ve formed. This tally of total blocking time is the first metric you can use to determine which blocker types are more harmful than others.

Figure 4 shows the first groupings formed and ranked. “Content missing” has the longest blocking time at 109 days, followed by “waiting for backend” at 87 days. Spending time on the first two or three groups avoids spending time drilling into impediment detail for blockers that rarely occur or were only blocking for a small amount of time.

Figure 4 – Group by internal/external, then by commonality.

Having prioritized the top candidates by impact, the next step is to understand the problems, identify the root causes, and brainstorm solutions. Understanding the problem seems simple enough, but it often takes asking “why?" many times. The five-whys method is a good root-cause facilitation technique. Using the method, a team can focus on ideas that solve the root cause of this problem. Figure 5 provides an example of the five-why technique with a team that has identified a root cause after the third question and defined an action: “Maybe it’d be smart to tell them always….” In this case, the team decides that telling the customer about upcoming work by adding a reminder after analysis is a good experiment. It’s an experiment, because this solution may not work. These experiments must be tracked and revisited next month (set a calendar reminder) to confirm whether the solution reduced the content problems or there was no decrease and another solution must be tested.

Figure 5 – Keep asking why until the team has a process-change experiment to run. Remember to analyze whether or not this experiment has resulted in the intended consequence.

Defects are blockers too

Although teams and tools often track defects differently from blockers, defects can be clustered like blockers to investigate their root causes and to solve them in an economically sensible way. Defects not only impede work in progress, they also block the team from starting other work by tying up the developers and testers who are correcting the problems. The root causes of defects can be analyzed at the same time as blockers, allowing prioritization to consider both sources of impediments as shown in Figure 6 and Figure 7.

Figure 6 – Blue notes are defects; red are blockers. Capture defects and blockers together and analyze as a whole.

Figure 7 – Defects can be clustered and prioritized using the same techniques as blockers.

Simple intuition fails for system-level prioritization

Although it would seem like common sense to first attend to the root causes of defect and blocker clusters with the most impact as measured by delay, it is not that simple. Some root causes may be too complex or expensive to fix. Some blocker clusters might precede a constraint in the system, and fixing them just hurries work into another constraint where they will wait anyway. In most cases, intuition gets close, but we are often fooled by biases, some of which might be:

  • The ones “we” own - we gravitate to solving root causes we can take control of as an individual or team.
  • The simple ones - we want to get a quick endorphin rush of fixing something, anything!
  • The most recent - we’re biased toward fixing the most recent blockers.
  • Thinking all fixes cost the same to resolve, or that fixing has no cost at all.

Being aware of these biases is the first step in avoiding them. Having a system, even a simple one, for prioritizing blockers will help you to move beyond intuition and reap larger benefits.

Three quick rules for blocker prioritization

When choosing which blockers to resolve, a few rules can help to identify easy wins and to avoid fixes that will not result in better throughput or cycle time.

The first rule is to avoid fixes that aren’t cost effective. For example, some work might be waiting for a clean test environment, but if this is a rarity then buying, building, and managing a complex environment may not be a cost-effective solution. Even in that new environment, will the staff be available to maintain and execute the tests and do deployments? You may spend a considerable amount of money to have work waiting for someone to run the tests in your shiny, new environment.

The second rule is to look for blockers that occur at a constraint of the system. Kanban systems make constraints easy to detect by how full the preceding queue or buffer column is, or how much work is complete but unable to be pulled into the constrained column. If there are blockers in the constrained column, look for to resolve these first. This advice follows the theory of constraints (TOC) work by Eliyahu Goldratt, in which he documents five focusing steps for constraint management in a manufacturing context.

Figure 8 – Three rules of blocker removal.

The third rule is to weight resolution effort equally against blocker impact. A simple matrix approach to prioritizing and ordering blockers helps quickly balance solvability and impact. Figure 8 shows the rules and a sample solvability matrix. Fix a 1 before a 2, etc.

Figure 9 shows how to use the solvability matrix. The “content missing” cluster is easy to solve because it only requires informing the customer earlier that work will be started soon. At the moment, this also creates the most blocked time, so it earns a 1 on our scheduling matrix.

The “test environment not ready” cluster is moderately easy to solve and has medium impact so we score it a 4.

The “waiting for backend” cluster has almost double the blocked time of “test environment not ready” cluster but is harder to solve and it gets a 5. Our prioritization scheme indicates that there is more benefit in solving “test environment not ready” before solving the back-end issues.

Figure 9– Example of prioritizing the blocker clusters based on time blocked and solvability.

Simulation approach

The methods used up to this point are qualitative in nature. The individual days of impact are quantitative, but in a complex system this isn’t a good indication of priority because it ignores work sitting idle (not work that’s blocked but work that is not progressing) because of constraints in the system – for example, staffing delays or waiting for test environments. To uncover which blockers actually improve the flow of items from the beginning to the end of a system, we need deeper analysis.

The most common quantitative analytical techniques used in other fields are simulation and Monte Carlo sensitivity analysis. Monte Carlo software is the most common tool used to perform this sensitivity analysis on blockers. This method isn’t for the light-hearted, and is recommended only after the simpler heuristic techniques have failed to gain traction.

Monte Carlo analysis is only starting to find its place in software development. A Monte Carlo tool specifically designed to model software development and perform sensitivity analysis on blockers is KanbanSim and ScrumSim, available free from Focused Objective. This tool builds models of development systems in a text file format for simulation and analysis. Once the models represent a system, you can perform experiments on the model. One such experiment is sensitivity analysis, in which you reduce the frequency and blocking time of each blocker and measure the impact of that by simulating the completion of an entire project thousands of times. By systematically testing each modeled blocker, you can identify the blocker that causes the most impact in the entire project. This true system-level analysis of blockers lets you prioritize blockers and defects by their impact on project delivery.  

A typical sensitivity analysis using Monte Carlo simulation would follow these steps:

  1. Build a computer model to represent the system being mimicked.
  2. Use the model to generate a baseline forecast and test its accuracy against a known result – say, over the prior three months. This tests how well the model represents reality.
  3. Reduce each blocker or defect element of the model by 50% (one at time) and generate a new forecast. Each of these forecasts will reach completion earlier than the baseline did. The number of days saved indicates the priority of the tested blocker or defect. The greater the reduction in days, the higher priority you have to fix it.
  4. Sort the results of all test runs from highest impact to lowest impact. The team brainstorms experiments that will help to resolve the most impactful.

(Click on the image to enlarge it)

Figure 10 – KanbanSim and ScrumSim is a Monte Carlo tool for modeling software development.

Figure 10 shows a screenshot of KanbanSim and ScrumSim. The tool visually depicts the project simulations it uses to forecast and analyze sensitivity. Once the model represents a system, you can run a simulation hundreds or thousands of times. During each run, the time taken for each work item and the rate at which items are blocked are randomly chosen from your actual historical ranges. The tool returns the fastest and slowest outcomes, as well as the most common ones. This lets you choose a more statistically probable result. Once this model is working, it's simple to have the tool perform automatic sensitivity analysis to rank the blockers and defects from most impactful to least.

(Click on the image to enlarge it)

Figure 11 – Models are text files using a specific language named SimML.

Figure 11 shows how different blocking events are modeled in SimML, a specific text-based language. Lines 80 to 90 contain two blockers with estimated frequencies and delays that are based on historical data that was gathered during the development process. These estimates will be altered by a fixed percentage during sensitivity analysis to see which blocker has the most impact over hundreds of simulation runs.

Figure 12 shows the sensitivity report that KanbanSim and ScrumSim generate. The tool has not only tested blockers and defects for impact, but has also altered other sources of potential improvement such as sources of scope creep and cycle time of each process step. This report shows that expedite items (work of a “drop everything” urgency such as production issues) have the most overall impact, followed by blocking of the test environment, then blocking of development due to missing requirements. The Intervals Delta column shows the expected improvement from a 10% reduction for each tweaked blocker.

Figure 12 – KanbanSim and ScrumSim sensitivity report.

This example scratches the surface as of how simulation and Monte Carlo analysis can be used to find system-level impacts and improvement opportunities. Further reading on Monte Carlo techniques can be found in the following articles, books, and videos:

Summary

Finding and solving delays in software-development systems and projects is a key technique for reducing costs and improving delivery predictability. Picking which blockers or defect causes offer the biggest and quickest rewards for effort is often too hard for intuition alone. The methods above let you record, cluster, and analyze blockers and defects to quickly find the low-hanging fruit and to determine an optimal prioritization of the others. You can use a range of techniques from qualitative to advanced simulation. Start capturing and using blocker and defect information to drive process improvement and speed up the flow of work to happy customers.

About the Authors

Troy Magennis (and Aiden) has been involved with technology companies since 1994, fulfilling roles from QA to VP for multinationals. Troy speaks at many agile conferences and has trained and mentored executives in agile in small and large organizations. Previous clients include Wal-Mart, Microsoft, Skype, Sabre Airline Solutions, Siemens Healthcare, and Tableau. Troy currently consults and trains organizations wanting to improve decision making on software portfolios and projects through agile and lean thinking and tools, applying Scrum and lean techniques appropriately and where they are going to make this biggest benefit through quantitative rigor. Troy has many books and articles on software development and practices, his most recent book is is Forecasting and Simulating Software Development Projects: Effective Modeling of Kanban & Scrum Projects using Monte Carlo Simulation. Troy can be contacted at troy.magennis@focusedobjective.com and you can follow his Twitter tweets on @t_magennis.

 

Klaus Leopold is computer scientist with many years of experience in helping organizations from different industries along their improvement journey with lean and kanban. He is co-author of Kanban Change Leadership (to be published by Wiley in May 2015). Klaus was one of the first lean kanban trainers and coaches worldwide. He was awarded the Brickell Key Award for outstanding achievement and leadership within the kanban community in 2014. Klaus is also a founding member of the management network Stoos. His main interest is agility beyond team level, especially in large projects and programs from 30 to 500 people. He publishes his current thoughts on his blog www.klausleopold.com and you can follow him on Twitter at @klausleopold.

Rate this Article

Adoption
Style

BT