Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Velocity and Better Metrics: Q&A with Doc Norton

Velocity and Better Metrics: Q&A with Doc Norton

Leia em Português

Key Takeaways

  • Velocity forecasts are usually around 50% probable. You’re betting on a coin toss
  • Monte Carlo simulations are a far better means of forecasting
  • Avoid setting targets for measurements
  • Focus on trends, not single data points
  • Measure multiple aspects of the system.  

Velocity is not good for predictions or diagnostics, argued Doc Norton at Experience Agile 2019. It's a lagging indicator of a complex system which is too volatile to know what our future performance will be; it isn’t stable enough to be used reliably.

Norton defined velocity as work units over time toward the delivery of value. It needs direction to be a sensible measurement, he mentioned. But even when velocity is connected to actual deployment or delivery, it still doesn’t tell you anything about the process. With just this measure, we can't tell if a team is doing well or not, he said.

To get higher probability predictions, you need to know your velocity history, backlog size, start date, and the split rate which is an indicator of the growth of your backlog. Norton showed how we can use Monte Carlo simulation to forecast when a team would be able to deliver. Depending on the confidence level that we want to have, the team will need more time until they can deliver.

For diagnostics, Norton showed how we can use cumulative flow diagrams. They can help us to track the amount of work done and not done over time, see the changes in scope, and spot where the bottlenecks are.

Doc Norton, an agile and leadership coach, spoke about using velocity and other metrics at Experience Agile 2019. InfoQ did an interview with him.

InfoQ: How do you define velocity?

Doc Norton: Velocity, at its simplest, is work units over time toward value. In most places, it is simply work units over time, but that’s not velocity as it has no clear direction. That’s just speed. Velocity is a vector, it needs a direction; that’s why we add "toward value". It reminds us that we’re headed in a specific direction. Value, as I see it, is learning, reduction of risk, or increased utility.

For me, we don’t count velocity until we’ve realized the value. If the team learned something such as whether or not the new treatment increases interaction, we definitely reduced a risk such as proving the new caching technique works when a flaky system we depend on goes down, or we’ve delivered utility such as a feature that users are actually engaging with, then we consider the story done and count it toward the velocity. In a lot of places, velocity gets counted when development is done. If the story never goes live, they still count it toward velocity. I think they’re missing a key point.

At a deeper level, Velocity is also a lagging indicator of a complex system. As is true with all lagging indicators, they are not very useful for predicting the future, but they are good for confirming trends and patterns. Think about quarterly sales in much of the retail space - the 4th quarter is not an Indicator of what first quarter sales will be, but it can confirm for us the general trend that sales are higher at the end of every year.

InfoQ: You stated in your talk that velocity is nearly useless. Can you elaborate?

Norton: I’ve now surveyed over one thousand individuals at agile conferences. Approximately 90% of those surveyed have low confidence in their velocity, have a disconnect between velocity and deployment, or cannot reliably project a large chunk of work using velocity alone.

Only 10% of agile adopters surveyed find velocity useful for actually measuring and projecting when work will be done and in production. Planning and forecasting is the purpose of velocity.

Based on this data, I think it fair to say that velocity is nearly useless.

InfoQ: Sometimes people say that velocity is something that can be gamed by teams. What is your view on this?

Norton: This is a dangerous perspective to take, especially if you are in management. A leader who blames the actors is missing the big picture and is failing to see the role they play in creating the outcomes.

Every system is perfect; it produces the exact result it is designed to produce. If the system in place is producing the wrong outcomes, it is less the fault of the actors within the system and more the fault of the system itself. We can admonish folks for not being better actors. We can insist that they should behave differently based on some set of beliefs or rules about humans - some moral or ethical expectations. But a key factor in all human behavior is the system or context within which they are operating. A perfectly rational individual who wants to do good work will inflate estimates or cut quality corners in an environment that promotes and rewards speed more than it promotes and rewards safety. This is not universally true. Some employees will insist on safety over speed. But it is true enough to have an impact on most teams in most organizations.

So when management puts a focus on speed or, worse yet, sets targets for speed, the resulting behavior from the actors is a natural consequence. It is not the team that games the system, it is the system designer that creates the game.

InfoQ: What if teams want to use velocity to see if they are improving, is that possible?

Norton: Perhaps, but not well.

First of all, as velocity is typically story points per iteration and story points are abstract and estimated by the team, velocity is highly subject to drift.

Drift is subtle changes that add up over time. You don’t usually notice them in the small, but compare over a wider time horizon and it is glaringly obvious. Take a team that knows they are supposed to increase their velocity over time. Sure enough, they do. And we can probably see that they are delivering more value. But how much more? How can we be sure?

In many cases, if you take a set of stories from a couple of years ago and ask this team to re-estimate them, they’ll give you an overall number higher than the original estimates. My premise is that this is because our estimates often drift higher over time. The bias for larger estimates isn’t noticeable from iteration to iteration, but is noticeable over quarters or years. You can use reference stories to help reduce this drift, but I don’t know if you can eliminate it.

Second of all, even if you could prove that estimates didn’t drift at all, you’re still only measuring one dimension - rate of delivery. To know if you are improving, you might need to also know about code quality, customer adoption, customer retention, and system performance, to name a few.

Velocity won’t tell you anything about the system health, team health, or company health. It doesn’t really even tell you about delivery health. It is pretty hard to determine if you’re improving with such little information.

InfoQ: What are your suggestions for using velocity to forecast when something will be done?

Norton: A forecast is only as good as the data and technique used to create it. For most teams, estimating with velocity means you have a roughly 50:50 chance of completing on or before the forecast date. Think about it - you are using a mean and extrapolating - of course your forecast is going to be in the middle of the range of possibilities. It also means you have little to no idea by how much you might miss the mark. Is the range of possible delivery dates a couple days, a couple weeks, or a couple months? Based on the technique you used, you cannot know this.

I used to apply a standard deviation against the mean in order to get a range. I then came to realize that many velocity distributions are not gaussian, but are instead shifted. So standard deviation isn’t mathematically correct. And it was hard to explain the range in terms of probability.

The most reliable technique I know of right now is to run Monte Carlo simulations. The folks over at Focused Objective have a spreadsheet that does this for you; it’s called the Throughput Forecaster and you can download it from Free Tools and Resources. Based on historical velocity data and a sized backlog of work, they run 500 simulated projects to determine the probability of completing the work on or before a set of dates. This is a simple explanation of what happens, but it is sufficient enough for a basic understanding.

This technique provides a range with probability; allowing you to have a much better conversation. Using this technique, a number of our customers have come to discover that the velocity technique they were using was producing numbers that had less than 50% probability. It is wonderful to see the relief wash over their faces as they realize it wasn’t that they couldn’t manage the teams well, it was that their forecasts were insufficiently formulated.

InfoQ: How can we measure quality using escaped defects?

Norton: Escaped defects are defects found in production. For teams I coach, there are no other kinds of defects, so "escaped" is superfluous. But in some organizations, there is this concept of QA defects and escaped defects. Escaped defects is all we care about here.

Measuring the number of escaped defects introduced in an iteration or throughput sample can be helpful, but it can be misleading. We might discover that the number of escaped defects per iteration is increasing. We know this is not "good" news, but do we know just how "bad" the news is?

To help us make this determination, we can instead look at the number of escaped defects relative to the throughput. This gives you a ratio that you can use to see trending. The hard part is that you need to make a concerted effort to identify during which iteration or throughput sample cycle you introduced the defect. Depending on the team and how the system is used, this isn’t always easy to do.

Say you have a velocity history of 8, 9, 12, 14, 19, and 21 and escaped defect count of 2, 2, 3, 3, 4, and 4 respectively. If escaped defects are a measure of quality, is the quality improving, degrading, or maintaining?

If we look at escaped defects alone, we can conclude that they are clearly on the rise. From this, we can conclude that the quality of the software we release each cycle is degrading.

If, however, we look at it as a ratio of throughput to escaped defects, we get .25, .12, .25, .21, .21, and .19. From this, we see an overall downward trend in escaped defects by throughput. This means that while there is an increase in the defect rate, it is actually in decline relative to the rate of value.

InfoQ: How can teams become better at using metrics?

Norton: Here are a few things I think teams need to consider if they want to leverage metrics in a more beneficial manner. Avoid setting targets for measurements. Focus on trends, not single data points. Measure multiple aspects of the system. Use the information to inform adjustments to the system.

Avoid setting targets for measurements. When a measure becomes a target, it is no longer a good measure. This is a paraphrase of Goodhart’s Law. Metrics are information about how the system is operating. They are a byproduct of the system. When you take a measure and you convert it to a control by setting a target, you inject the measure into the system - you literally change the system. The measurement no longer means what it once did and, as a result, the target you believe you just set is no longer the target you think it is.

Focus on trends, not single data points. Concern yourself less with today’s value and more with the trend. How is the metric trending? Is it headed in a direction that you expect based on your strategy? If it is not trending as expected, consider what in the system might be a root cause and work to address it. Know that there will be times where a sound strategy will cause metrics to trend counter to the normally desired direction. For example, code tends to get more complex while you are transitioning from a monolith to microservices. Once the old monolithic code is removed, the complexity falls again.

Measure multiple aspects of the system. There are tensions in every system. If we measure only a single dimension, we are less likely to understand the true health of the system. For example, if we were to focus exclusively on rate of delivery, we would likely fail to notice the impact on quality, customer adoption of features, or employee satisfaction. Each of these ultimately impacts our ability to sustain the system.

Use the information to inform adjustments to the system. We’re measuring to inform, not setting targets to drive. We’re looking at trends and contrasting them against our expectations based on our strategy. And we’re looking at multiple dimensions to help ensure we don’t over-optimize on a particular dimension. When our trending is off in any of these dimensions, we want to consider what adjustments we need to make in order to bring the system closer to a healthy balance.

About the Interviewee

Doc Norton, agile and leadership coach, is a software delivery professional working to make the world of software development a better place. His experience covers a wide range of development topics. Norton declares expertise in no single language or methodology and is immediately suspicious of anyone who declares such expertise. An author and international speaker, Norton is passionate about helping others become better developers, working with teams to improve delivery, and building great organizations. In his role at OnBelay, Norton is provided opportunities to realize his passion every day.

Rate this Article