BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles How To Not Destroy your Agile Team with Metrics

How To Not Destroy your Agile Team with Metrics

Leia em Português

This item in japanese

Bookmarks

I think the agile community needs to change how it measures success for agile teams. The ways that we gather metrics and the information we seek out of those metrics is actually getting in the way of what’s most important, making working software.

By Forcing individual metrics we sometimes discourage team collaboration by focusing too intently on others we can actually skew the very thing we’re trying to measure thus defeating the purpose.

The way I see it, there’s two major problems:

The observer effect: The observer effect states that observing a process can impact its output. For instance, telling a team that you’ll be keeping a close eye on their velocity might cause that team to overestimate their items in order to increase their velocity. This is especially dangerous when working with story points since there’s no way to compare the validity of an estimate.

This image was taken from here.

While the above comic has probably happened at some point, it’s not my favorite example of the observer effect at work. Let’s talk about a support person I knew a long time ago, we’ll call him “Jason” since that was his name. Now Jason was a great tech he helped others on particularly difficult calls, he solved problems correctly, generally on the first call and got great feedback from customers. The problem is that Jason’s call times were too long and this particular metric was very important to management. A few meetings later and a review later it was made clear that Jason HAD to get his times down or look for another job. Fast forward a few weeks and Jason now was in the top 5 for the entire support group for call times? How did he do it? He wouldn’t tell anyone for the longest time until one day I came in early and there was Jason, an hour before his shift, and picking up calls and immediately hanging up.

Here’s the interesting thing, Jason wouldn’t have done something like that if his call times hadn’t been more important than his actual performance. Measuring his call times negatively affected his output. Moreover this was a bad metric to begin, even without extreme examples like Jason, we’ve all been on a call with a tech support agent who just wants to get you off his line. The question is, what calls are your teams hanging up on to make their numbers?

The streetlight effect: The streetlight effect is our human tendency to look for answers where it’s easy to look rather than where the actual information is. For instance, counting the lines of code produced is easy but doesn’t tell us anything about the quality of the application, the functionality it provides or even the effectiveness.

This image was taken from here.

I recall some time ago I was working on a team that made multiple products each with different quality standards. The thing was that “Product A” had much more difficult quality standards the “Product B” or “Product C” or Product D”, which wouldn’t be too big a problem except that management had decided that quality would be a big deal when the next review came around.

The thing is, something like “Quality” is a bit of a nebulous concept and it’s not really easy to measure. Error rate however is much easier to measure, thus anybody who found themselves working on “Product A” with it’s higher quality standards would be at a bigger disadvantage come review. So who ended doing that work? Interns mostly, temps and contractors when they were around and anybody else.

As it turns out, even though error rate was easy to measure, it didn’t tell us anything valuable since the number of errors produced was more dependent on product than employee. Instead we drove several good new hires, lost a customer, and lowered morale for the whole team since their job became less about building and more about avoiding errors.

Now both of these examples take place outside of software development so let’s apply these concepts to some common “Agile” Metrics you might be familiar with. What’s easy to measure?

Unit Tests written: Most agile developers write a lot of unit tests; test-driven development creates even more tests (both of which create better quality code). So measuring a developer’s productivity by the number of tests they create must be good! Actually, the observer effect kills this one dead. Telling a developer that they’ll be measured on the number of tests they write ensures they’ll create many tests with no respect to the quality of those tests. Our goal is not to ship tests; our goal is to ship working code. I’ll take fewer better tests than more crappy tests any day.  

Individual Velocity: Once again the observer effect makes this a bad metric. If a developer knows he’s being individually graded on his performance and also knows that he only gets credit for the things he specifically works on then he’s actively discouraged from contributing to the group. He’s placed in the very un-agile situation of competing with his team rather than contributing to it.

In a perfect world an agile team is collaborating, interacting, discussing and reviewing almost everything they do. This is a good thing for building quality software and solving problems fast but this level of interaction makes it nigh impossible to separate a person’s individual productivity from the group, so don’t try, you’ll simply hurt your team’s ability to make good software.

Team Velocity: This is one of the most misunderstood metrics in all of Scrum. A team’s velocity is unique to them. It simply can’t be compared to another team. Let’s say that team A estimates a certain amount of work at 50 pts. for a sprint and team B estimates that same work at 150 pts. for the same sprint. Now if both teams finish their sprint successfully then team A has a velocity of 50 pts. and team B has a velocity of 150 pts. Which team is more productive? Neither. They both did the same amount of work.This metric is particularly evil because it encourages teams to fudge the numbers on their estimates, which can affect the team’s ability to plan their next sprint. If the team can’t properly plan a sprint then that puts your entire release in danger of shipping late.For more about your Scrum team’s velocity, you can check out an earlier blog post I wrote.

Okay smart guy, what metrics should we use?
Glad you asked, we measure productivity by the working software we deliver. We measure actual output rather than contributing factors. This approach is more Agile because it frees the team to build software in whatever way can better contribute to their success rather than whatever way creates better metric scores. It’s also much more logical since working software is something that we can literally take to the bank (after it’s been sold of course).

So what are the actual new metrics?

Value Delivered: You’ll need your product owner for this. Ask him to give each user story a value that represents its impact to his stakeholders. You can enumerate this with an actual dollar amount or some arbitrary number of some kind. At the end of each sprint you’ll have a number that can tell you how much value you’ve delivered to your customers through the eyes of the product owner.

This metric does not measure performance, instead it measures impact. Ideally your product owner will prioritize higher value items towards the top of the backlog and thus each sprint will deliver the maximum value possible. If you’re working on a finite project with a definite end in sight, your sprints will start out very high value and gradually trend towards delivering less and less value as you get deeper into the backlog. At some point, the cost of development will eclipse the potential value of running another sprint, that’s typically a good time for the team to switch to a new product.

On Time Delivery: People sometimes tell me that agile adoption failed at their company because they couldn’t give definite delivery dates to their clients. I don’t buy this. One thing that an agile team should definitely be able to do is deliver software by a certain date. It’s possible that a few stories may not be implemented but those are typically the lowest value stories that would have the least amount of impact on the client. That being said, a team’s velocity should be reasonably steady, if it goes up or down it should do so gradually. Wild swings in velocity from sprint to sprint make long term planning harder to do.

Here’s the metric: if a team forecasts 5 stories for an upcoming sprint and they deliver 5 stories then they earn 2 points toward this metric. If they deliver 4 stories or they deliver less than 2 days early (pick your own number here) then they earn one point. If they deliver more than 2 days early or they only deliver 3 (out of 5) stories they earn no points. At the end of a quarter or the end of a release or the end of the year the team will be judged by how accurately they can forecast their sprints.

So what we’re measuring is value delivered to the customer and on time delivery of that software. Which are the only two real metrics you can literally cash checks with.

About the Author

Sean McHugh is one of the Scrum Masters at Axosoft, who are the creators of OnTime, an agile development tool. He works with customers who are brand new to Scrum and also with experienced customers who are beginning to implement a Scrum project management software solution for their development teams. He gets a chance to work with teams from around the world, who each have their own unique challenges and solutions. He loves to share his thoughts and experiences with the Scrum community writing on the company blog.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Even number of stories can go wrong...

    by Melle Koning,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Have seen a powerpoint demonstration with these kind of 'user stories'

    1. Change the text on the submit button so that a role can more easily read the action...

    And when that is presented shout: Yes! another five story point user story removed from the backlog!

    It seems that teams always will find ways to game the system..

  • Measure business value

    by Roopesh Shenoy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think measurements are not going to be of any use unless you can directly see business value it impacts - and even then only a subjective measurement is really useful compared to a very objective measurement.

    We need to focus on how to create more value compared to how to do stuff faster - once you build trust and have open communication channels between IT and other functions, it is definitely possible and worthwhile to do that, IMHO.

  • Re: Measure business value

    by Roopesh Shenoy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Just to be clear, I agree with what is important and what is not, as per the article, but I do not think we need to go into such detail for measuring stories and dollar amounts. Software has a very high potential multiplier effect depending on scale at which it operates - the value a particular piece of software brings depends not only on the feature but also on the size of operations, especially number of existing and potential customers/users.

    I don't buy long term planning needs more accurate estimates - in fact this is true of those projects that are less important in value compared to those which are intrinsically valuable.

    Disclaimer - I had read about this somewhere, so this is not an *original* epiphany from me - but I totally agree with it.

    If you are building software that is intrinsically valuable to the company in the long term - innovative, has core-business impact etc. Kind of, where if the projected cost is x and the value you get out of it is 50x. Then -

    1. Even if the budget explodes 2-3 times *for a given set of features* it's fine - people maybe unhappy for a while but in the long run it won't matter. Depending on your budget situation, either you get lesser features or pay more for the same set of features.
    2. Time taken is never a constant - it's more about how many features you deliver in a given time frame rather than the other way round. In fact most important projects will never run out of features, so its more important to keep the backlog organized


    On the other hand, for projects where the cost is x and the expected value is 1.5 to 2x, it becomes way more important to control the project. It might seem like a great idea to take it up, but given the inherent difficulty in estimating a software project, it may not make any sense taking up such projects.

    So in conclusion, I think it's more important to focus on the right projects, rather than trying to make the wrong projects successful by measuring a whole lot of useless things.

    Also it depends on who wants to measure - if the team wants to measure their own progress, it's great. Any metric they bring in might be useful. On the other hand, if people external to the dev organization want to measure things, then it's a recipe for disaster - what you measure simply doesn't matter, you got a really big culture problem that will doom your company.

  • You'll always get what you measure

    by Matthias Marschall,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I've seen both happening: the observer effect and the streetlight effect. They both can really have a very bad impact on your teams and products.
    But measuring value proofed to be extremely hard for us. We were not able to assign meaningful value ratings to the stuff we build. Eventually the PO has to prioritize based on his gut feeling.
    Currently we do not measure anything regularly. But we do keep an eye on anything which does not work as expected and make sure we analyze it and learn from it.

  • Re: You'll always get what you measure

    by Roopesh Shenoy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Eventually the PO has to prioritize based on his gut feeling.


    :) 100% agree - that's why you also need awesome product owners/managers, and not just awesome developers.

  • Re: You'll always get what you measure

    by Matthias Marschall,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    absolutely

  • Value Delivered

    by Tamas Rev,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It's very tricky to measure the value of the delivered software (a.k.a. the impact). It works if and only if:


    • The team knows the business value of a certain story _before_ they developed it. In fact, nobody knows that, business department is only estimating it.

    • The team can give high priority to the stories with the best impact / story point ratio.


    Unfortunately none of the above are always true.

    Very often the customer needs some stories with high impact to be developed no matter how much time you estimate for it. They just need it, period.

    Also, business department is likely to re-evaluate developer performance based on the actual $$ impact. This can be very unfair when the business messes up their estimates.


    At citcon it became clear that these measures shouldn't trigger hard decisions because of the same things you mention in this post. These metrics are rather just tools of getting information about what's happening.

  • Good article

    by Dave Nicolette,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Key take-away: "We measure actual output rather than contributing factors."

  • Re: Even number of stories can go wrong...

    by Sean McHugh,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    The metric I mentioned above doesn't have anything to do with the specific stories delivered or even the size of the stories above. The goal is to be accurate in the stories forecasted. So it doesn't matter if you forecast 5 story points or 500 story points so long as you deliver close to what you predicted. The goal isn't to deliver much more or much less than what you forecasted so that a sprint can be predictable.

    There's certainly ways to game this system but hopefully the benefits to doing so should be minimal.

  • Re: Measure business value

    by Sean McHugh,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I completely agree, rather than focusing on developing faster we should focus on delivering more predictably, that's the essence of the On Time Delivery metric.

    The other of course being focused on Value Delivered which is always going to be a subjective value but once again hopefully a useful one that guides the team towards providing more value per sprint.

  • Re: Even number of stories can go wrong...

    by Melle Koning,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hi Sean,

    I do agree with that conclusion. It goes without saying that eventually the delivered value is the important metric. Even better, if you happen to have multiple product owners who estimated(!) upfront what the bussiness value would be, we will get better PO's in this process as well.

    Cheers,

  • Re: Measure business value

    by Roopesh Shenoy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hey Sean,

    I agree with most of your article about not measuring stuff like velocity, but I don't agree with the On-time delivery metric (at least the way I understood it).

    On-time implies that the original estimates were accurate and somehow the team can be coached to be good at it - I say there's a really really small chance that's correct. This is not like building a house, where building the next house is somewhat similar in terms of cost and effort - in software if something is built once, you tend to reuse it rather than rewrite it (which might the only place where estimates are accurate). Most of the stuff we write is new, and a lot of time also gets spent in non-development tasks such as designing solutions to requirements and research (which is work, btw)

    The closest I like is how Fogbugz handles this - they go the other way and do a monte carlo simulation of past estimates vs. actuals, and give a probability curve of when the software might get delivered. I think that's a much better way to handle this than trying to punish people who may not be great at giving accurate estimates but might be star developers otherwise.

    Disclaimer: I have no vested interest in either of these products, and it's been a while since I used either of them so I may not be aware of the latest changes.

  • No Quality Metrics?

    by Brett Powell,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Ok so we have a business value metrics (Agreed)
    We have a throughput type metrics (Reliable delivery) Assuming that reliable delivery is what u are optimising for (This is not always the case).

    One thing that Agile has taught us is that the only way to go fast is to go well (Robert C Martin).
    What are your thoughts on quality metrics?
    Brett

  • Re: No Quality Metrics?

    by Sean McHugh,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I see quality as a contributing factor to actual output, if you have poor quality then your value provided will begin to go downward (which should bring quality up in conversation during your retrospectives). Another big problem with quality metrics is that you're measuring a negative impact which has it's own problems (morale, deceptiveness, finger pointing, etc...). If quality is an issue, it won't be a secret (bugs in the backlog) but if it isn't an issue then there's no reason to focus on it with a metric.

  • Re: No Quality Metrics?

    by Brett Powell,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Yes poor quality will eeeeeventually show up in reduced outputs but that is an extremely long lagging indicator.
    For a team to self correct we need feedback as early as possible so that we can correct issues before it starts impacting on velocity.

    The big quality issues are not just bugs but business and user satisfaction, coherence of the code etc
    This is something we are still trying to figure out I know from experience that focusing on code craftsmanship, great user experiences good technically practices increases capacity over time but how do we help teams measure this and use it for self correct.
    The problems I have with velocity and biz value as the metrics is that you imbed short term thinking into the team. I am trying to figure out how to use metrics that are leading indicators of future capacity creation.

    Brett

  • Re: No Quality Metrics?

    by Roopesh Shenoy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I don't think metrics will work in quality - IMO, software development is a human-capital intensive work - and when we try to focus on processes and metrics but forget the individual, we make a big mistake.

    Instead we could focus on creating a culture and promote practices that improves quality - for e.g. code reviews (both within and across teams), "hallway" usability tests, interaction with the end-user (it is surprising how many development teams "never" speak to their end-users) are few practices that I think help a lot as far as quality is concerned!

  • Re: No Quality Metrics?

    by Charles Bradley, Scrum Trainer...,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I disagree with Sean on this, to a certain degree. If you're going to attempt to measure value delivered, then you also need to use the same measuring stick for "value removed" via defects.

    Another simple metric is the # of high impact bugs released into the wild. High impact might mean "needs a special bug fix release" to fix it or something similar. There will always be blamestorming in these instances, but a) Scrum hold the *whole team accountable* NO MATTER WHAT and b) You will need someone with ethical behavior to determine if a bug is high impact or not.

    Mike Cohn has some good info on a "Balanced Scorecard" for metrics in his _Succeeding with Agile..._ book. Martin Fowler has a good article on how software productivity cannot be measured: martinfowler.com/bliki/CannotMeasureProductivit...

    I think we're also getting to the point where we need to use tools like Sonar to measure code quality and technical debt, and use some of those metrics as "indicators."

    It is always true that, in complex problem domains, no set of metrics will give you the whole picture. Is is also always true that metrics can be overused to bad effect. However, we can use them as "indicators" for better products, and we can also inspect and adapt our measurements to help us provide better indicators. To be successful in today's competitive environment, we do have to measure, whether we like it or not.

  • Re: No Quality Metrics?

    by Charles Bradley, Scrum Trainer...,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Btw Sean, I tend to agree with almost everything else you said on this page.

  • Totally missed the point

    by Ilja Preuß,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think you totally missed the point. In a complex work environment like software development - where it is impossible to measure all important aspects of the work - any metric used in a motivating way leads to dysfunction. Just using different metrics won't solve the problem, if you really want to reduce dysfunction effectively, you need to use them *differently*.

    The "value delivered" metric easily can lead a team to deliver more value now by building up technical debt, for example. And the "on time delivery" metric can keep them from taking the risks that often are necessary to find truly innovative solutions.

    If, as leaders, we care about effective, sustainable work, one of our main tasks is to work exactly on *not getting what we measure* - that is, to not use metrics as carrots and sticks, but as analytical tools and prediction devices, as tools that inform decision making.

    If you are interested in more about the why and how, I highly recommend the book "Measuring an Managing Performance in Organizations" (I'm in no way affiliated with the book).

  • Re: Totally missed the point

    by Ilja Preuß,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    The book title is "Measuring and Managing Performance in Organizations", sorry for the typo.

  • Re: Totally missed the point

    by Ilja Preuß,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Two examples for metrics that are used in a way they are not supposed to be optimized:

    I once worked for a team that started doing TDD, and we decided to measure the number of unit tests written during a sprint. The metric reminded us of the fact that we wanted to write tests, and it provided opportunities for celebrating as a team that we were implementing our decision. We weren't judged by outside the team on the metric, though, we didn't try to maximize it, and as soon as writing tests became an integral part of our work, we abandoned the metric, because it wasn't useful anymore.

    And obviously team velocity is an important metric to predict delivery dates and/or content, and even the feasibility of a product. The dysfunction doesn't come from measuring it, but from judging the team by the metric.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT