A common project management criticism is that since story points vary across teams, there is no way to ascertain one teams progress with respect to another. Amongst Agilists there is a general consensus that comparing velocity across teams is an anti pattern and is best avoided lest the overall productivity should suffer.
Sterling Barton suggested that the team velocity dependent on various factors. Velocity can be summarized as a function of
function of (sprint length, team makeup, sizing nomenclature, product)
All these factors change across teams and hence there is no way that the velocity of the teams can be compared.
Likewise, Danilo Sato mentioned that since velocity is dependent on so many factors, hence it makes sense to only compare velocity within a team. That too should be a trend comparison to gauge the progress of the team. He mentioned,
“Why is Team A slower than Team B?”Maybe because they estimate in different scales? Maybe their iteration length is different? Maybe the team composition is different? So many factors can influence velocity that it’s only useful to compare it within the same team, and even then just to identify trends. The absolute value doesn’t mean much.
A further drawback, as mentioned by Bob Hartman, of comparing team velocities is that the teams would start changing their story points scale so that their velocity looks better than the comparison.
Suddenly what was a size 1 last iteration is now a size 3 (or worse!). Don’t fall into this trap. If teams are working hard, meeting their iteration objectives and keeping the product owner happy I don’t care if their velocity is 10 or 10,000.
However, most managers would argue that there should be some mechanism to baseline the story points across teams for a valid comparison. Mike Cohn attempted to arrive at this common baseline for story points by getting a broad group of individuals across teams together and estimating a dozen product backlog items. Mike conducted this session with 46 people. He added,
When that meeting was over, each pair of estimators went back to their teams with twelve estimates. Those estimates could then be used as the basis for estimating future work. As each team estimated new product backlog items they would do so by comparing them to the initial 12 plus any estimates that had been produced since (by them or any other team).
Mike was however quick to mention one of the pitfalls of this exercise. He mentioned that once the teams are compared to each other, they respond to the peer pressure by gradually inflating the story points that they assign to stories.
Consider, for example, a team that is arguing over whether a particular story should be estimated at 5 or 8 points. If the team is under pressure (real or just perceived) to increase velocity they will be more likely to assign the 8. The next story the team considers is slightly larger. They compare it to the newly assigned 8 and decide to give it a 13. Without pressure to improve velocity, this same team may have given the first item a 5 and the second (slightly larger still) item an 8. In this one scenario the team has inflated their points from 5+8=13 to 8+13=21, or more than 50%.
Mike does advocate the creation of a common baseline, however he also warns the managers to be cautious and be on a constant lookout for scenarios in which the story points might be inflated.
Dave Nicolette mentioned an interesting example on how the story points might vary across teams.
How many Elephant Points are there in the veldt? Let's conduct a poll of the herds. Herd A reports 50,000 kg. Herd B report 84 legs. Herd C reports 92,000 lb. Herd D reports 24 head. Herd E reports 546 elephant sounds per day. Herd F reports elephant skin rgb values of (192, 192, 192). Herd G reports an average height of 11 ft. So, there are 50,000 + 84 + 92,000 + 24 + 546 + 192 + 11 = 142,857 Elephant Points in the veldt. The average herd has 20,408.142857143 Elephant Points. We know this is a useful number because there is a decimal point in it.
He added that if these story points were now plotted on a graph then going by the numbers it would be easy to fire Herds D & G and reward the Herds A & C.
Thus, in most circumstances it would be a futile exercise to compare velocity across teams. If somehow, there is a consensus built across teams for assigning story points then one might attempt to compare but that too with a great deal of caution.
Community comments
Shared Product Backlog?
by Jim Leonardo,
Re: Shared Product Backlog?
by Dave Nicolette,
Re: Shared Product Backlog?
by Danilo Sato,
Far too weakly worded
by J. B. Rainsberger,
Create a Meta Team for Estimating
by Tim Elton,
Re: Create a Meta Team for Estimating
by J Nasstrom,
Re: Create a Meta Team for Estimating
by Arijit Sarbagna,
Book Reference
by Enrique Acuna,
Acceleration versus Velocity
by Greg Warden,
Shared Product Backlog?
by Jim Leonardo,
Your message is awaiting moderation. Thank you for participating in the discussion.
I agree comparison is an antipattern, but the puzzle I have is... How do you determine what to pick up if you have a shared product backlog among teams? If story points vary, how can you know what to grab? Or is that even an antipattern itself?
One thing you always have to look at when comparing teams or comparing individuals are things like defect rate, "accepted but suboptimal" issues, etc. It's easy to maintain high V if you spew bugs left and right.
Re: Shared Product Backlog?
by Dave Nicolette,
Your message is awaiting moderation. Thank you for participating in the discussion.
"If story points vary, how can you know what to grab? Or is that even an antipattern itself?" IMHO that's not an antipattern, but it's an interesting question. It leads to a deeper question: Do you need story points at all? If so, why? Are you using them for something, or just tracking velocity because a book tells you to do so? Could the team work in a different way that made story points unnecessary? I think it's healthy to question our assumptions. It sometimes leads to improvement. It will at least lead to a better understanding of why we do the things we do.
"...easy to maintain high V is you spew bugs..." Wrong by definition. The team gets no credit for broken stories. The stories aren't "done" if they have bugs in them.
Re: Shared Product Backlog?
by Danilo Sato,
Your message is awaiting moderation. Thank you for participating in the discussion.
+1 to what Dave said regarding bugs.
I wrote about the problem with counting half-done points too:
www.dtsato.com/blog/2009/07/03/velocity-gone-wr...
www.dtsato.com/blog/2009/07/04/velocity-gone-wr...
Far too weakly worded
by J. B. Rainsberger,
Your message is awaiting moderation. Thank you for participating in the discussion.
Measuring velocity across teams is not just "an antipattern" and "best avoided". It's mismanagement in the extreme.
Don't get me wrong: I understand a manager's impulse to measure *something*, but this isn't it. Measure almost anything else.
Incidentally, when I say that cost estimates are waste, this is exactly the kind of waste that cost estimates cause. We spend all this energy arguing over "story points", either within a team, among teams, or across the organization, when we /need/ to focus on increasing the flow of value.
Forget velocity: bit.ly/UUdwX
Create a Meta Team for Estimating
by Tim Elton,
Your message is awaiting moderation. Thank you for participating in the discussion.
We created an estimation meta team, in much the same way as some organizations make an architecture meta team. The estimation meta team is comprised of members of each of our three Scrum teams. We also ensure that the meta team has at least one architect, developer, database, and QA. We wanted the estimation group to remain consistent, but did not want to make it an ivory tower of estimation. So, every Sprint we replace only one member of the meta team so that everyone has a chance to cycle through without losing the core group.
Since all our teams are working on the same product, we have a combined backlog. Using the estimation meta team, we are able to keep the backlog estimated several sprint's worth of stories in advance. We realize that we are trading away a little accuracy of the estimates by using this approach, but at the same time we feel that normalized points are more useful and valuable.
Re: Create a Meta Team for Estimating
by J Nasstrom,
Your message is awaiting moderation. Thank you for participating in the discussion.
What is wrong with calculating back to man-days or man-hours and compare that? Meaning that we compare the monetary cost between teams.
If the team has an accurate estimate of their velocity, they could say what proportion of the next sprint would be occupied by the feature. The man-days corresponds to the cost for the company, which is a relevant measure.
The lowest bidders will be held responsible for their estimation.
Re: Create a Meta Team for Estimating
by Arijit Sarbagna,
Your message is awaiting moderation. Thank you for participating in the discussion.
Actually, this doesn't work by bringing in man hrs - as that is 'estimate'. Story Point is about 'sizing'. :) Now, yes...it is possible to compare monetary costs - but that will mean, you need to have 2 teams - working on the same set of functionalities to be delivered & under similar environment (& same impediments as well). Now, does it make sense to do this or rather focus on improving velocity of each teams - as they continue in their respective paths? This answer will tell why it is simply "being stupid" to even try comparing teams in SCRUM.
Book Reference
by Enrique Acuna,
Your message is awaiting moderation. Thank you for participating in the discussion.
I am currently working under SCRUM methodology. But our Business Owner compares our teams using velocity as metric. I would like to know if someone could give me some book reference where explicitly clarify this.
Acceleration versus Velocity
by Greg Warden,
Your message is awaiting moderation. Thank you for participating in the discussion.
I am considering measuring Average Acceleration (running avg of story points/week^2) as a measure of whether we are speeding up or slowing down across several teams. Thoughts?