InfoQ Homepage Articles Q&A and Book Review of Software Development Metrics

Q&A and Book Review of Software Development Metrics

Oct 12, 2015 18 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

The book Software Development Metrics by Dave Nicolette explores how to use metrics to track and guide software development. It explains how different development approaches and process models, like traditional waterfall-based or iterative agile software development, affect the choice and usage of metrics. The book provides descriptions of metrics that can be used for steering work and for managing improvement.

Free downloads of chapters of this book are available on the publisher's book page of Software Development Metrics.

InfoQ interviewed Nicolette about the purposes that metrics should serve, using metrics in software development, measuring velocity to track team improvement, measuring the mood of teams, preventing that metrics are misapplied, and recommended metrics for agile teams and stakeholders that they can use to deliver value.

InfoQ: What made you decide to write a book about software development metrics?

Dave Nicolette: The tl;dr answer:

(1) I consider metrics to be absolutely necessary for early detection of delivery risks; (2) I find metrics to be painfully boring; and (3) I observed that most people involved with software delivery had no idea what to measure or what to do with the data they collected.

Per (1) and (2), I wanted to find a minimalistic, pragmatic way to measure that would do the job of early detection without taking up too much of my precious time. Per (3), I wanted to help people discover metrics that were easy to understand and useful in their own work context.

The long answer:

I can't say just when I realized this, but at some point early in my career I noticed that in project after project people were blind-sided by delivery issues late in the process, when it may be infeasible to recover. They tried various ways of estimating, predicting, planning, and tracking the work, and still there were unpleasant surprises late in the game.

In later years, when working as a consultant or coach for software delivery organizations, I found that almost no one seemed to understand what to measure in order to detect emerging delivery risks early enough to deal with them. They carefully collected data but still they were blind-sided again and again late in the delivery process. This remains true today in virtually every organization I visit.

I've observed two general behavior patterns that seem to make this problem worse.

First, many project managers fall in love with one software tool or another. They like to play with the tool's features for querying, charting, and graphing data, and they lose sight of the meaning and value of the data they are tracking. They produce output so densely packed with data that the result is all noise and no signal. Then they are blind-sided by delivery risks late in the process. I see this often in companies that use enterprise-style project management tools that offer a multitude of options for configuration, querying, and charting.

Second, people who consider metrics inherently interesting, rather than a necessary evil, tend to approach the subject from a mathematical and statistical perspective. They enjoy playing with the math, and they tend to make the problem more complicated than necessary ... although they themselves don't consider their methods complicated. Ordinary folk in the trenches who are not mathematicians or statisticians find such methods confusing, and they can't use them well. So, they are blind-sided by delivery risks late in the process.

In 2009 I gave a talk on metrics at an "agile" conference. The room was small and hot and the battery in my brand-new laptop gave out after about 30 minutes. I continued with the talk using flip charts. The room was overcrowded and people were standing in the doorway and just outside in the hall. Despite these problems, when our time was up the people wouldn't leave. They wanted more. They asked about specific situations, and so forth. We had to be thrown out of the room to make way for the next talk.

I had similar experiences with user groups like PMI and APLN. There seemed to be a general thirst for useful, simple, practical metrics. Frankly I was astonished by the level of interest in a subject that I've always considered rather dry.

So, one day I thought it might be useful to compile some of this information in book form. Now that the book is out, we'll find out how useful it really is. It's a slender volume because the editors kindly removed most of my natural snarkiness, as a service to the reader. All that remains is plain information, intended to be practical.

InfoQ: In your opinion which purposes should metrics serve?

Nicolette: In the context of software delivery, I think there are two reasons to measure.

First, we need to know if the work is on track with respect to our expectations or plans. The earlier we can detect a variance, the better chance we will have of adapting to the variance without losing too much time or money. I call this "steering" the work. Exactly how you steer depends on organizational culture, management style, the drivers of the project, the nature of the work at hand, and other factors. But however you operate, you need objective data to know which way to steer.

Second, we need to know whether our attempts to improve delivery are helping or hurting. There is general agreement in the software field that it's a good idea to examine our delivery process more-or-less continually, and look for opportunities for improvement. There is a sort of Hawthorne Effect, if I may use the term loosely, in that when people are actively changing their process it feels as if things are improving. In reality, unless we compare our actual performance with improvement goals, we don't know whether our changes are improvements or not.

InfoQ: In the book you explore the usage of metrics for different development approaches, process models and delivery modes. Can you explain why these things matter for metrics?

Nicolette: This goes back to my preference for pragmatic and simple metrics that serve one or both purposes - steering and tracking improvement. Many people try to apply, believe they are applying, or simply misapply one software delivery framework or another. They use the metrics that are recommended for that framework, but those measures might be based on assumptions that aren't relevant in the particular organization. Compounding this issue, when people are using unfamiliar methods they tend not to apply the methods with perfect skill. They may be doing things in a way that renders the recommended metrics meaningless.

For that reason, it's important that people peel back the shiny outer layer of buzzwords and look at the way work actually flows in their organization. Then they can measure what is really happening instead of what they believe or wish were happening. Otherwise, they can look forward to being blind-sided by delivery risks late in the process.

The model I came up with to assess the way work flows is aimed at identifying the factors that are meaningful in a given organizational context, so that people can measure reality. For instance, there are many organizations that call themselves "agile" but actually carry out software delivery work in the traditional way. It does them little good to track measures such as Velocity or Running Tested Features, because the activities that would yield meaningful numbers are not happening at all.

Similarly, if a product is supplied on a "continuous beta" basis or an organization is using a Lean Startup approach to home in on customer needs, there is no predefined "end date." It isn't a project at all. In that situation, a metric like "percentage of scope completed to date" is absolutely meaningless.

InfoQ: Can you elaborate how a metric like "running tested features" can be used to track and manage progress towards delivery goals?

Nicolette: Sure. Like most metrics, RTF is based on certain assumptions about the way work flows in the organization. It assumes (1) an adaptive approach, (2) incremental delivery of production-ready features at least to a test environment, (3) automated checks that exercise the code regularly.

With adaptive development there might be no predefined, fixed scope; we work toward business capabilities defined by our customers until sufficient functionality is in place to support those capabilities. The customer decides when the solution is good enough and they don't wish to spend any more on further refinement. With incremental delivery, we can "go live" at any time with the features that have been completed to date. With automated checks, we know immediately when we have broken something that had been working before. It gives us the confidence to "go live" at any time.

RTF is useful for tracking progress in that sort of environment. It supports adaptive development by showing progress toward customer-defined business capabilities, without the need for predefining 100% of planned scope. It supports incremental delivery by showing how many features are production-ready at any given time. It is meaningless without automated checks, because in that case any statements about the production readiness of completed features are only a matter of wishful thinking.

Now, consider a traditional software delivery process. Development does not begin before we have a pretty clear definition of 100% of scope. Delivery may be in phases or releases, or it may be "big bang" style. There is nothing to check until the code has been written. For those reasons, RTF is meaningless and useless for traditional development.

Like other metrics, RTF is useful when it is useful, and not useful when it is not useful. You can make the same sort of assessment of any metric - understand the assumptions on which it is based, and see if those assumptions hold in your organization. When you do this, it's helpful to ignore all buzzwords and look reality square in the eye. Reality won't blink, but that doesn't mean you have to.

InfoQ: Most Agile teams measure their velocity to know how much they can deliver. Velocity can also be used to measure if a team is improving. Can you explain how?

Nicolette: Well, Velocity is another metric that depends on certain assumptions. It depends on (1) a time-boxed process model, and (2) incremental delivery of production-ready features at least to a test environment. Provided these assumptions hold, Velocity is useful for short-term planning and also to accumulate empirical data for burn charts, which in turn can expose emerging delivery risks. So, it's useful for steering in cases when the work is done in a certain way.

In my experience, Velocity is a little too slippery to use for tracking improvement. There are three reasons.

First, a team's improvement efforts might include changing the length of the time-boxed iterations or shifting away from a time-boxed model altogether. Either of these changes would make it meaningless to compare new Velocity observations with historical ones.

Second, as teams get used to the problem domain, the technologies of the solution, and working together, it becomes possible to deliver User Stories of a given size in less time than at the outset of the project. As this happens, teams tend to adjust their unspoken consensus about relative sizing. In Iteration 5, a team may consider a certain amount of work to represent a story size of 3. In Iteration 15, their size 3 includes more work than it did in Iteration 5. This happens naturally and not necessarily as a conscious decision by the team. It may look as if the team is delivering the same number of story points in Iteration 15 as in iteration 5, when in fact their delivery performance has improved.

Third, many organizations attempt to apply "agile" methods but they don't quite "get it." It's common for management to set expectations for Velocity. Sometimes, management even asks teams to "stretch" to hit Velocity targets. Velocity is designed to be an empirical observation of actual delivery performance. When we set Velocity targets, we completely break the metric. It becomes utterly meaningless. In that sort of environment, it is useless for tracking improvement. Unfortunately, that sort of situation is very common.

So, to use Velocity to track improvement certain things must remain true throughout the period of measurement: (1) The iteration length must remain the same; (2) relative sizing must remain consistent; (3) management must not set Velocity targets. It is rare for all these things to remain true for any length of time. Therefore, I don't recommend Velocity for tracking improvement. I find Cycle Time to be a practical alternative that gives pretty much the same information without being sensitive to process model, sizing and estimating practices, or target-setting.

InfoQ: There are measurements for measuring the mood of teams like the emotional seismogram or the happiness index that you described in the book. Do you have examples of teams that use these metrics that you can share?

Nicolette: I'm not sure how to cite individual teams as examples, but I can share some general observations.

I've seen the Niko-Niko Calendar serve a useful purpose on newly-formed collaborative teams. As they progressed through the Storming and Norming phases (per Tuckman), I've observed that having an indication of each team member's general mood helps people keep behaviors in perspective. For instance, if I don't know Joe, and Joe is rude to me at work, I might make the assumption that Joe is a rude person. My expectations of Joe will be colored by that experience going forward. However, if Joe posted a frowny face on the calendar that morning, I will know his rudeness is only a reaction to something in his life unrelated to work; it's no big deal, and tomorrow is a new day.

Why is that important? Because people perform better at work when they get along well and when they have generally good morale. Understanding each team member's emotional state defuses tension that can occur when we overreact or make assumptions based on small behaviors out of context, especially when we don't know the person well. I won't function at my best if the question, "What the hell's wrong with Joe?" is floating around in the back of my mind all day. When teams reach the Performing phase, they generally dispense with the Niko-Niko Calendar.

The Niko-Niko Calendar can also expose organizational issues. The Omega Wolf pattern mentioned in the book is a case in point. When one team member always seems to be negative, regardless of the other team members' moods, it usually doesn't mean that person is a downer. More often, it's a symptom of systemic, organizational issues. If we fired the glum one, someone else would soon fill the same role because it isn't a personal thing. There's something about the work environment that is creating the need for this behavior. It's some kind of safety valve that enables the rest of the team to function. The true corrective action is to change the organizational parameters that are creating the need for an Omega Wolf. Emotional measures can sometimes reveal this kind of problem when mechanical measures can't.

Emotional Seismogram can serve the same purpose as Niko-Niko provided team members update their mood every day at the same time of day. In that case, it's just the same information as Niko-Niko displayed in a different form. But in all cases I've seen so far, teams use Emotional Seismograph as a retrospective tool. They fill in their mood as they remember it, looking back. My observation is the results tend to be strongly affected by the outcome of the iteration and by the results posted by team members who went first. If things turned out well, then people remember their mood as having been positive. I tend not to use this measure, as it can be unreliable or misleading.

The Health and Happiness Index, which I learned from a ScrumMaster at a client company, can have an interesting effect. It's used as a retrospective tool. Sometimes, team members feel bad about an iteration because it was difficult, and not because the outcome was poor. The reverse can occur, as well. The Health and Happiness Index compares observed delivery performance (Health) with perceived performance (Happiness). It can bridge the gap between perception and reality.

I've seen teams come into a retrospective feeling down, and after going through the Health and Happiness exercise they realize they had accomplished some very good things after all. They felt bad about the iteration because they were down in the weeds working on hard problems the whole time.

I've also seen teams come into a retrospective feeling like they were the masters of the universe, only to discover that the work they had delivered was incomplete for one reason or another - they had never integrated their code because of a dependency on a third party; their User Stories were really only tasks masquerading as User Stories, and nothing of value made its way to the customer; etc. The metric can help teams learn to focus on important things and to recognize when they are making assumptions connected with the difficulty of the day rather than the business outcomes of their work.

InfoQ: Can you elaborate how metrics can be misapplied? Are there things that can be done to prevent this?

Nicolette: It's hard to know where to begin. I think at least 1/3 of the book discusses anti-patterns in the use of metrics. The inspiration for the book was the observation that people tend to measure the wrong things, or measure the right things in the wrong way. Misapplication of metrics is the rule, not the exception.

Think of a metric as a tool. To get value from a tool, you need to do two things: First, choose an appropriate tool for the task. Second, use the tool correctly. If I need to drive a screw, a hammer would be the wrong tool. If I choose a screwdriver, but I bang on the head of the screw with the handle, I have the right tool but I'm not using it correctly.

The same rules of thumb apply to metrics. Consider Velocity, for example. Many times teams tell me they are tracking Velocity because management told them to "use agile," and "agile" says "track Velocity." (FWIW I haven't found that in the Manifesto.) But they aren't delivering potentially-shippable solution increments in every iteration. They aren't even working on end-to-end features. They literally have no Velocity. They are counting something and charting it, but it isn't Velocity. Bang, bang, bang. They can't use the metric to predict anything, except we can predict they will be blind-sided by delivery risks late in the process.

I've seen Velocity abused in the opposite way, too. Teams that are chunking up the work in a traditional project and pretending to run "iterations" often use Velocity as a proxy for "percentage of scope completed to date." This isn't Velocity at all, it's just tracking task completion. Management expects a certain amount of work to be done in each iteration in order to stay on schedule. Teams simply game the numbers to avoid getting into trouble. This leads to the predictable outcome: They are blind-sided by delivery risks late in the process.

There are probably dozens, if not hundreds of similar examples involving any metric you care to name. The general cause is that people are trying to measure things that aren't really happening, because they believe those things should be happening.

InfoQ: Are there specific metrics that you want to recommend for agile teams that they can use to deliver value?

Nicolette: The short answer is "No." I'm not crazy about the term "agile team." I think of "agile" as a school of thought that offers a lot of great ideas for adaptive software development. But adaptive development can be done without any reference to "agile," and effective "agile" development can be informed by other schools of thought, such as Lean Thinking and Systems Thinking. "Agile" methods can be and often are applied to traditional projects. It's also the case that many teams labeled "agile" are far from it. So it is, in a sense, a magic word. I enjoyed Harry Potter, but, you know.

In fact, one point I try to make in the book is that people ought to set aside buzzwords and look at how the work actually flows in their organization. That is the basis for choosing appropriate metrics, and not the magic words that happen to be in play at the moment. Next year there will be a new set of magic words, but software will still be software.

InfoQ: Similar question for the managers and other stakeholders of agile teams, can they use the same metrics or would it be different ones?

Nicolette: For the purpose of steering, the interests of all direct stakeholders of a software delivery team are the same. So my answer is "yes," they can look at the same numbers and see the same information, and discuss corrective actions together with a common understanding. But this is not limited to "agile teams," as the question states. It applies to all software delivery teams.

When it comes to metrics associated with improvement, I tend not to recommend sharing the numbers with anyone outside the team. In most organizations there is high risk that management will use such measurements to cause pain for the team (even with the best of intentions). Metrics to track improvement are best used by a team to inform their own improvement efforts. Once they have achieved a performance improvement goal they were tracking by a certain measure, they need not continue to track that measure.

Get a 40% discount by using the code "sdmiq40" when buying the book on publisher's site.

About the Author

Dave Nicolette has been involved in the software industry since 1977 in a variety of roles. In recent years he has been working mainly as an agile and lean coach at the team and organizational levels.

Related Editorial
Popular across InfoQ

InfoQ Software Architects' Newsletter

Q&A and Book Review of Software Development Metrics

Write for InfoQ

About the Author

Rate this Article

This content is in the Culture & Methods topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter