Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles How Agile Teams Can Improve Predictability by Measuring Stability

How Agile Teams Can Improve Predictability by Measuring Stability

Key Takeaways

  • Predictability, the answer to "when will we be done?", is the holy grail for organisations and managers.
  • In agile, predictability stems from assumed system stability through measures like WIP limits and velocity.
  • We have developed a new metric, the Stability Metric (ψ), using queueing theory to test assumptions of system stability in agile systems. We tested it against millions of data points from historic projects over the last 20 years.
  • We found that 73% of projects analysed were not stable / predictable. These systems had overloaded teams and large, growing backlogs. This suggests that assumptions of stability in agile are not necessarily always true.
  • You can use ψ as a tool to help your team improve its predictability.

Around the corner from Rob’s house is a small convenience store. It is like similar chain stores around the world. Every so often, a queue of people who have finished shopping and need to checkout will form in front of the checkouts. When this happens, one of the cashiers on the checkouts will ring a bell and other members of staff will open other tills and start serving people in the queue. These extra tills remain open until the queue dissipates, at which point, the staff will return to other work, complete training or take breaks. As a system it is elegant, efficient and effective.

Around the world, there is a shortage of software developers. Developers are reporting increasing levels of mental health concerns from stress, burnout and boreout. Agile systems of working have been around for more than two decades and are very popular ways of developing and delivering software.

Agile promotes a pace of work that should be sustainable indefinitely - minimising both burnout and boreout. Agile frameworks also promote visualising knowledge work through lists of Product Backlog Items (PBIs) created by stakeholders which are then serviced by one or more teams of developers.

The efforts of past practitioners and researchers focused on the speed at which these backlogs are serviced. These have been criticised as creating conditions of a "feature factory" where output is prioritised without focusing on actual user needs.

Our research uses established Queueing Theory to look at both supply and demand to model agile systems as networks of queues. Using this we hope to diagnose the systems to ensure the system design is adequate to satisfy the needs of organisations, their employees, customers, users and shareholders.

In this article we will present our approach for analysing agile systems as networks of queues and how we have used it to analyse 926 projects in the Public Jira Dataset. We explain how you can measure the Stability Metric (SM) for your queues. Finally, we will present our planned next phase of research.

A brief introduction to queues and how they apply to agile

Survey results from past State of Agile reports suggest that scrum and kanban dominate when it comes to agile. Both are built using queues for transparency and visibility and use  assumptions of stability. Kanban frameworks, for example, assume Little’s Law holds, which only occurs when the queueing system is stable, when average service / departure rates are higher than average arrival rates. Scrum systems tend to use the approach of "yesterday’s weather" where the velocity delivered in recent sprints is used to predict likely capacity for upcoming ones.

In terms of queues, in scrum, there are typically two queues that the guide refers to as "backlogs". The first is a product backlog and the second is a sprint backlog. The Sprint Backlog is stable from a queueing perspective when all the work in it is done. In high-performing teams with low levels of carryover between sprints, this might happen as frequently as every sprint. Lower performing teams, with lots of unplanned work, like production defects, might only burndown their sprint backlog to zero once every few sprints. This is less ideal, but is still stable. However, the sprint backlog is only a sub-queue of the scrum system, and the product backlog can be where the challenges in stability exist. Figure 1 shows a simple kanban queueing system with a Work In Progress (WIP) limit of three items. Real systems can have multiple columns and swimlanes, each of which creates complex sub-queues within the overall system.

Figure 1. A kanban system can be modelled as a queue

How to measure predictability from queue system stability

The stability metric, Ψ, is a really simple calculation that can be done on the back of an envelope. It has two inputs: the arrival rate, λ, and the service rate, μ. The arrival rate is the number of PBIs added to a system in a period of time. The service rate, μ, is the number of PBIs successfully done by the team in the same period of time. With these two inputs you can calculate the dimensionless Ψ by just dividing the service rate by the arrival rate. When Ψ is less than one, the system is unstable; when it is greater than one, it is stable, and when it is equal to one it is optimally stable.


\(ψ = \frac{μ}{λ}\)


When Ψ is equal to one, the average arrival rate is equal to the average service rate and Little’s law applies. In this state, the backlog is neither growing nor shrinking over time and the average time an item will spend before it is done can be calculated by dividing the total number of items in the system, L, by the arrival rate, λ.

When Ψ is less than one, then items are arriving faster than they can be dealt with and the backlog is growing. These systems are not inherently predictable because even though the service rate is known, the waiting time before the team deals with the item cannot be easily predicted.

Example: worst to best in five quarters

Ψ is best explained with an example. In his seminal book on kanban, David J Anderson describes the initial study that led to the creation of the kanban framework in the chapter "Worst to best in five quarters". In it, he describes that the team he was working with had an average inter-service rate of 11 days per ticket. This gives a value of μ of 1/11 tickets per day or 0.091 tickets per day on average.

Through the changes they introduced, the arrival rate was reduced to 14 days between tickets (having been higher than this previously). This new arrival rate gives a value of λ of 1/14 tickets per day, or 0.071 tickets per day on average. This means that the system that gave rise to kanban had a Ψ given by 0.091 divided by 0.071, or Ψ = 1.28. This means that the updated system was stable since tickets were exiting the system faster than they were arriving.  It is easy to see why this system out-performed its peers over 5 quarters.  

How prevalent are stable agile systems?

Our research to date has focused on using the Public Jira Dataset, a collection of Jira data  from multiple organisations across several decades. It was published by Montgomery et al.  in 2022 and is available publicly. Using this data we analysed arrival and service rates for work completed successfully for Jira Projects with more than 30 data points since 2002. This was 926 projects from 12 organisations and over 1.6 million data points.  

Most of the systems we analysed were not stable, with 73% of all projects having values of less than one, as shown in Figure 2. This means that these systems are growing and are not predictable. However, there was a clustering around 1 showing that these organisations and projects were independently finding ways to get their work done.

Figure 2. Number of Jira Projects that fall within each range of for 926 projects in the Public Jira Dataset    

The histogram bin outliers in Figure 2 indicate potentially troubled projects. The histogram bin on the far left hand side shows projects where ten PBIs arrived into the backlog for each item delivered by the team, a likely very stressful situation. The histogram bin on the right hand side shows that these teams have nothing to do more than 50% of the time. Hopefully, these are part-time teams, otherwise it may be painfully boring to work on these projects.

We also discovered from this data that unstable systems tended to have much, much larger backlogs on average than stable systems. Also, the average time between delivering one PBI and the next one was longer for unstable teams. All of these suggest that there could be value in teams using the stability metric. However, we caution that the research is still in progress.

Practical insights from Ψ

A likely core insight of using the metric is whether a team or organisation is designed appropriately for the work it is trying to achieve. To improve predictability, scrum masters can use the metric to look at historic patterns to determine system stability. Then, depending on the result, they can adapt the system to change stability and change predictability.

For example, a low Ψ might indicate a need to increase the deployment frequency or decrease the lead time for changes. A large backlog on its own is not a reason for concern, but a large and growing backlog may require changes to processes and teams to bring it under control. A high value of Ψ may indicate the need to discover new features or work for the team - especially if combined with a small backlog. Each change a team makes will result in changes to system stability either through changing arrival rates, service rates, or both. Measuring before and after the change allows us to determine the impact.

Planned next steps

In the next phase of our research, we plan to introduce Ψ in controlled trials to begin assessing how changes to stability changes system performance. We plan to use multiple teams, using different agile systems and observations over several months to measure how changes of stability impact items like predictability, teamwork and morale. This will add more knowledge around impacts of using Ψ in real world dynamic situations to develop insights to date on historic data.

How to measure Ψ for your agile system

If you would like to try the stability metric, we suggest the following steps. They are written for Jira, but similar steps can be used for any PBI-visualisation system.

  1. In Jira, open a teams board and open the filter query for the board
  2. Edit the filter query to remove epics, subtasks and any resolutions that typically denote work not completed by the team such as "Won’t Do", "Duplicate", "Cancelled" and so on.
  3. Edit the columns to show at least:
    • a. Key
    • b. Project
    • c. Issue Type
    • d. Created
    • e. Resolved
    • f. Status
  4. Export the fields to Excel and convert them into a table
  5. Sort the table from Oldest to Newest on the Created column
  6. Count the number of tickets created in the last three months (or a similar timebox that gives 30 or more PBIs). This gives λ
  7. Sort the table from Oldest to Newest on the Resolved column
  8. Count the number of tickets resolved in the last three months (or a similar timebox that gives 30 or more PBIs). This gives μ
  9. Divide μ by λ to give Ψ.
  10. Analyse using the following:
    • a. If Ψ is greater than 1, your system is stable
    • b. If Ψ is less than 1, your system is unstable
    • c. If Ψ is equal 1, your system is marginally stable

Further Reading

About the Authors

Rate this Article