InfoQ Homepage Presentations How to Tell Your Boss Story Points Are a Terrible Metric

How to Tell Your Boss Story Points Are a Terrible Metric

View Presentation

Speed:

Download

44:07

Summary

Liz Ince and Chris Wilkinson look at the different measures commonly found in software development, what they’re used for, highlight the good, the bad and the ugly aspects of measuring progress versus productivity, and how all metrics should relate to a desired business outcome. They share stories of what happened when the wrong thing was measured and offer some practical ways to help us.

Bio

Liz Ince has been working in the IT industry for over 30 years. She is now a Managing Solutions Architect at Capgemini. Chris Wilkinson is a Software Development Lead at Capgemini.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Ince: Whenever I go to a project review meeting with senior managers and stakeholders, this is about the only question I ever hear.

[Demo video start]

Child 1: Are we there yet?

Child 2: Are we there yet?

Child 3: Are we there yet?

Child 1: Are we there yet?

[Demo video end]

Ince: But of course we do have to tell senior managers and stakeholders when we will get there.

Wilkinson: We've all been in those moments where we start a new job or go on a new engagement where somebody mentions the term points. And whether that's story points or in some other guise, how is this being calculated? How am I expected to work to this metric, which I don't yet understand? That deep-rooted apprehension tells us that there may be a better way to do this. Now, Liz and I are aware that we're all that's standing between you guys and getting a beer for the evening.

Ince: No. Prosecco.

Wilkinson: We're going to share with you some ideas and some alternatives. My name's Chris Wilkinson. I'm a lead software engineer and architect for Capgemini UK.

Ince: And I'm Liz lnce, and I'm an enterprise architect also working for Capgemini in the UK. Thank you. Before we talk about story points in particular, I'd like to spend a few minutes talking about metrics and measurements in general. Heisenberg's uncertainty principle tells us that the act of measuring something changes a thing we were trying to measure in the first place. So no matter what we do, our measurements are never going to be totally accurate. People often tell me, if you can't measure it, you can't manage it. Personally, I think that's a cop-out because businesses manage things they cannot measure all of the time. If you have a target and you meet that target, why should you bother to address any other problem or issue? In this way, measurement can actually unintentionally stifle innovation.

Of course, don't forget, it actually takes effort to produce the metrics. How much time are you actually spending collecting the numbers? What about all those desirable characteristics you want to encourage? What about the senior developer who takes time to help and mentor a more junior colleague? Or what about the team who worked together to coordinate the efforts so they maximize code reuse? Are they any less productive? And if we only manage things we can measure, what about the things we cannot measure? What about things like work ethic, loyalty, dedication, team spirit, cooperation? And because, of course, we are talking about metrics and measurement here, honesty.

Metric Fixation

I'm a bit of a fan of an author called Jerry Muller. And Jerry Muller wrote a book called "The Tyranny of Metrics." In this book, Jerry looks at the process of measurement against target from the point of view of the people being measured. Jerry also identified something he's called metric fixation. A metric fixation is based upon three beliefs that are prevalent in organizations globally. On the face of it, these beliefs are entirely logical.

The first one is we put in place some standardized metrics, such as the crime rate for a town or city, and then we measure against those metrics. So, crimes reported and solved, and then we publish the results as rankings and ratings. Now, standardized metrics are always put in place with the very best of intentions, such as the organization that wants to remove bias and ensure that all employees are managed fairly without favoritism. And, of course, we all have a sneaking suspicion, don't we, that some public organizations and bodies are run more for the benefit of the employees and the public they serve. So, the idea is we put the standardized metrics in place, and we make these public bodies accountable to us, the public.

However, if you put in place standardized metrics without care, they can give you some rather unintended consequences. Everybody in the room, you are no longer working in software. You are no longer developers, you are all now heart surgeons, okay? So, you are all brilliant heart surgeons, why would you be anything else? Of course. Now, as heart surgeons, you are measured against two ratings. The number of people you cure, and the number of annual mortality rate. The people you fix and the people you kill. So, you are all brilliant heart surgeons. Your rankings are absolutely excellent. You're at the top of your game. You are worshiped by your colleagues and feted by your friends and family, and you are number one.

One day two patients come and see you. The first patient needs a standard operation. You've done it many times before, it's not a problem. It will just add to your brilliant results. The second patient, however, is seriously ill. They need a very complicated and difficult operation, one that you haven't really done before. The trouble is if you don't do this operation, the patient will die. So the question I have is, do you operate on the patient and risk your rating, or do you send the patient to another surgeon to do the operation so you preserve your excellent score?

Can I have hands up for those people who would do the operation and risk their rating? And hands up for those people who's going to send this poor guy off to another surgeon to do the job and you're kind of keep your rating? Slightly more I think said they would do the operation. You're a lovely caring audience, and for that, I am really grateful. This is not a hypothetical case. In America at the moment, this is what is happening. Patients who require difficult, dangerous operations tend to have to shop for a surgeon because so many are concerned that they will damage their review.

Let's go back to 2008, Wells Fargo. Wells Fargo put in place a new sales plan. In this sales plan, any sales rep who doesn't make their quota is sacked. A short time later, there was an investigation of Wells Fargo by the financial ombudsman. It turned out that over 5,000 employees took out financial products themselves in an effort to make their quota. Okay, for some of them, it was to get a bonus. But for the vast majority, it was an attempt to keep their jobs. Would anybody like to guess the size of fine that was then handed out to Wells Fargo as a result of this? Anybody want to shout out for me?

Participant 1: $100 million.

Ince: $100 million. Not far off. $185 million as a result of this.

I'm now going to tell you a story. Robert F. McNamara was a Harvard MBA student who went on to become a brilliant Harvard business school professor. Robert exemplified the cool, rational application of planning and analysis to business. He was so brilliant, he was recruited by John F. Kennedy to work for the American government. Now, John F. Kennedy wanted one single simple metric to tell him if he was winning the Vietnam War or not, and he wanted Robert to do that for him.

Now, Robert had never been in the army and had no military experience whatsoever. It is about this point that the law of unintended consequences starts taking a hand in the Vietnam War. So, Robert thinks about what he's been asked to do, and he ponders a bit, and then he thinks, "Well, I know. If we kill more of their soldiers than they kill of ours, we'll be winning. So, if we have a battle and we kill 100 of their soldiers, but they only killed 20 of ours, we'd have won." Robert comes up with this metric, and it's called body count. It doesn't take the army and marine officers long to realize that promotions and medals are largely being given on the result of confirmed kills.

So the army units stopped cooperating with each other, and they start arguing over who can claim which dead body. They send soldiers out into the middle of the battlefield with clipboards to count the dead. Now, a soldier in a battlefield with a clipboard is a sitting duck to Viet Cong sniper. Soldiers were literally dying to collect the metrics. So, what happened when they didn't have enough dead bodies? Well, that was simple. They just went out and shot a few civilians.

What Drives People?

What does motivate people to get out of bed and get into the office for 9:00 on a Monday morning? It's a very interesting question. An American psychologist called Frederick Herzberg researched this question. He came up with something called the hygiene motivation theory, quite often just called the two-factor theory. So, what Frederick realized was there are certain basic needs that are common to all people around the globe, and he called these the hygiene factors, and there are four. It is food, shelter, family, and companionship. It doesn't matter what country, what religion. This is common to all human beings on the planet. As I said, these are the hygiene factors. People are motivated to achieve their hygiene factors. So, for most people, what this translates to is having to earn a certain minimum amount of money to pay the rent or the mortgage, put food on the table, and send the kids to school with clothes on their back and shoes on their feet.

Now, the interesting thing happens when you have earned sufficient money to achieve your hygiene factors. At this point, the second factor, the motivation, factor kicks in. But it is also about this time that money ceases to be a major motivator. So if money isn't a major motivator, what is? People want to think that they are contributing. They want to think that they are doing a good job. They want to think that their work is being rewarded and respected. They want their friends and their family to be proud of them. They want to feel that they are contributing towards society, and they want to feel that they are part of a bigger goal.

So, John F. Kennedy is visiting NASA in 1961, first time he's been to NASA. He's being shown around the building and JFK comes across a janitor who is mopping a corridor. So, JFK goes up to the janitor and he said, "Hi, I'm JFK, and what's your role here in NASA?" The janitor turns around to JFK, and he says, "I'm helping to send a man to the moon." The janitor is part of a larger goal.

I'm going to try something. Chris here is a really nice, helpful, friendly kind of guy. Chris, I've got a flat tire on my car. Would you come and help me change my car tire, please?

Wilkinson: Of course.

Ince: See? Really nice, helpful, friendly guy. I'm going to try it another way. Chris, I've got a flat tire on my car. If I pay you a pound, would come and help me change my tire, please?

Wilkinson: I'm worth a bit more than that. I'll take 20 quid.

Ince: That's really interesting, isn't it? Chris will change my tire for nothing, but he will not do it for a pound. So, quick recap. Goodhart's law says if your measurement has become a target, it ceases to become a good measurement, and you should get another one. Gaming the system. Maybe we shouldn't refer to the Vietnam War, but you get the point about that one I'm sure. Reward and punishment. There's a lot of research that shows that actually by offering people a reward means they perform poorer. Think about what it is you're doing there. And motivation. Please make sure you're not trying to offer Chris a pound to change a car tire, because I can assure you it really won't work.

History

Wilkinson: Thanks, Liz. I'm going to talk a little bit about the history of metrics in software. The history of active software metrics dates back to the '60s when the line of code measure or KLOC for thousands of line of code was used. Now, this was used to measure productivity and program quality, sometimes as defects per KLOC. The need for more discriminating measures came with the diversity of higher level programming languages. A line of code in assembly language, for example, is not equal to a line of code in a high-level language like Java in terms of effort, functionality, and complexity.

The '70s saw an explosion in the interest of measures of complexity, and things like function points were devised and designed to be independent while accounting for complexity in scale. Now, that was defined by a guy called Alan Albright at IBM, and that was a measure to express the amount of business value a functional system could provide to a user. Various things came from that, like the IFPUG methods and cosmic function points, for example. Now, Alan Albright's research observed that function points were highly correlated to lines of code. So, why not use a more objective measure like the one it replaced, like lines of code? Complexity is very much needed to be known ahead of time, but that leads to a much more waterfall like design process, like we're aware of today.

Now, Agile got its roots in the software development space with the introduction of the manifesto for software development in 2001. It was created by a cadre of founders with 12 founding principles of Agile development. That includes gems such as our highest priority is to satisfy the customer through early and continuous delivery of valuable software. At regular intervals, the team reflects on how to become more effective, and then tunes and adjust its behavior accordingly.

"Story points" is standard if any particular phrase is. "Gummi bears" has been popular since the early days of extreme programming, and "Nebulous Units of Time," or NUTs, has enjoyed some currency. The most egregious mistake is to invest too much time or debate into the choice of unit for estimates, insofar as scheduling based on velocity makes this unit inconsequential. Velocity is an in-team comparison only. You often find that bad managers will use this to compare teams with other teams. Now that obviously inhibits collaboration, and it encourages team versus team gaming.

But the message we have is that story points do have a place. But what is wrong in our industry today is that as a unit, they are abused, and as a phrase, often overloaded. They are a capacity planning tool and not a measure of productivity. I've got a question for you. Probably got the right audience for this. Does anyone know the difference between mass and weight? Shout out if you would.

Participant 2: Weight is the mass times gravity.

Wilkinson: It is indeed. Yes. So, what do you measure on a set of bathroom scales?

Participant 2: Your weight.

Wilkinson: Yes. And what unit of measurement do you use with the scales?

Participant 2: The wrong one.

Wilkinson: Correct. You might measure your weight in kilograms, and as we're aware, a kilogram is the unit of mass. Does that mean that your scales would report accurately on the moon? Probably not, right? A few weeks ago when I was researching for this talk, I came across this tweet. It states, “Metrics are just a proxy for what you really care about, and unthinkingly optimizing a metric can have unexpected negative results.” So, why do I think that's relevant here? Well, not only are the metrics contextual to the environment within which they are used, but also the cultural attitudes that is imposed upon them.

What Happens When It Goes Wrong?

I want to share with you a couple of examples. Back when I started my first job out of university, I worked for a small Java house in Harrogate in North Yorkshire. In my first week, I went to one of the morning meetings, which later became the standups, and we were very active in the local dev community. We hosted a few XP meetups and things like that. It was a very smart Australian guy, one of the senior developers, and he had a bit of banter with the chief scientist who was running the meeting at the time. He said, "I'm going to farm as many story points today as I can." And he completed 13 story points in a single day with the velocity of around about 1.2. From what I knew of the process so far, I was very impressed by that. I was in awe of him, but that was actually my first experience of the system being gamed.

I'd like to share with you another example, more recently in my career. I've worked with a client with Capgemini. It's a large client, and they have over 300 plus staff on this particular program. They've really struggled with measuring progress and productivity. They use story points in their 10 plus feature teams. And this was picked up by the program management and given the term program points. I've been passive-aggressively referring to those as pick two points in meetings, rightly or wrongly, I suppose. But at a recent all-hands event, the program leadership couldn't answer whether that was a measure of time, complexity, or scale.

Now, statements of work and thus monetary value and roadmaps have been determined using this metric. Scrum teams are also being held to account when they didn't deliver on them. Now, we know that that's going to cause some issues. Now, they're seeing problems with delivery, planning, and reporting, amongst other things, of course. If a single metric doesn't work, what happens when we introduce multiple metrics that balance each other out?

Now, it's important to note that story points are subjective and relative to a designated baseline story, or BS, as I like to call it. But the important thing to note is that two team estimates cannot be compared. Not only that, but two sprint estimates cannot be compared. Now, in any given sprint, story one, story two, story three, is given a points value based on the designated baseline story, and also all the stories which are within the same sprint.

If we can't use story points, what can we use? Well, there's a myriad of things that we already measure. There are a lot of other things which we could measure quite easily that we sometimes probably don't, things like duplication, code quality, build successes, infrastructure utilization, complexity, the number of requests. Well, obviously there's a huge list there.

Let's take a look for a second at a typical reporting up and down structure within an organization. What's keeping these people up at night? So, stakeholders care about things like when is my software going to be released? How much is it going to cost? Baseless value for them is things like, how many pairs of shoes have I sold? Project managers care about things like release dates, how much of my budget have I burned? And developers things more like code quality, the number of tests, and things like that.

Metrics within the SDLC

Let's take a look at a typical software development lifecycle. We capture and apply many of these things throughout that cycle. When we write code and we commit it to source control where you can count the number of commits. When we execute a build, we code to take some code quality statistics. When we deploy our artifacts, we can take note of the deployment frequency, how often we're deploying that artifact. And also what kind of functionality we've delivered, or how many stories have we completed when we come to release our product.

This is by no means a comprehensive list, of course. Now, if my slide and the contrasting colors don't blind you for a second, interestingly, if you overlay those different layers, there's some correlation to the phases of the software development lifecycle and the people that care about those metrics at each stage. So stakeholders, for example, they care about things later within the software development lifecycle, like what is our MTTR, mean time to restore, what SLAs are we working to? What functionality have we delivered? They ask questions like, do you use this to do things that they couldn't do before?

There's an excellent book. It's called "Accelerate: The Science of Lean Software and DevOps." I've heard this mentioned a couple of times this week already. How many of you have read the book out of interest? Wow, I was expecting more than that, to be honest. It's written by Nicole Forsgren and Jez Humble. You can learn more about their organization, DORA, DevOps Research and Assessment. They are on their website. Now, in the book, there's an excellent justification for utilizing metrics that promote primarily two things, throughput and quality.

Measures That Correlate with High Performance

These are the metrics that are highlighted in the book. We've got lead time, that is, the time it takes from something being committed to when it is running in your production environment. The deployment frequency, how frequently are we deploying our artifacts to our production environment? The mean time to restore, so how quickly, on average, can we restore service after an outage of some kind, like a DR scenario? And the change fail percentage. So, what percentage of stories or functionality require some kind of remedial action like patching or fix forward or some such?

Now, a successful measure should have a global outcome. It should mean something to everyone. Anything that demonstrates competent throughput or operational stability could be considered in that list. This is about outcomes and not output, quality and not quantity, business benefit, and not just being busy and writing code for the sake of it. Now, they should incentivize improvement and disincentivize errors. Let's consider for a second what Liz said about what motivates people. Now, also, we should consider the intangible things, like happiness, loyalty, team spirit, and cooperation.

Actually, there's an interesting point that I've missed, something that we've noticed with our engagements at Capgemini. So, contracts where delivery metrics use things like story points, and when that's used as a key contractual KPI, they're much more likely to fail than those that use key contractual KPIs related to business value. We've demonstrated that recently with some of our public sector engagements.

Going back to those metrics. Now, each one of those creates a virtuous cycle, and it focuses the team on continuous improvement. To reduce lead time, we have to reduce the wasteful activities. In turn that lets you deploy more frequently, and it forces the team to improve practices and automation. Improving practices and automation means that your speed to recover from failure is improved, and that increase of better practices, automation, and monitoring will reduce the frequency of failures.

Westrum Typology of Culture

Earlier, I talked about the intangible things. The biggest predictor of job satisfaction is how effectively an organization processes information, and that's determined by the three cultures model by an American sociologist called Ron Westrum. Now, there are three categories in Westrum's model. Those are the pathological or power-oriented, the bureaucratic or rule-oriented, or the generative or performance-oriented. Now, the [inaudible 00:28:54] report states that they use a series of [inaudible 00:28:58]-style questions to measure and categorize organizations and place them into this model, like [inaudible 00:29:05] a technique that we're probably all familiar with, where responses range from strongly disagree to strongly agree.

Now, this allows us to place a value on the intangible and gives us some numbers that we can analyze. Here's the three models from Westrum's taxology of culture. Earlier I mentioned one of our clients that was struggling and misusing the metrics. So, I posed those [inaudible 00:29:39] questions to all of the tech leads within their feature teams. Now, possibly not surprisingly, that would categorize them as a bureaucratic organization. There's no foul play here, but I suspect that not only are they misusing the metrics, the organization is so large, they're trying to fix things. They are imposing rule after rule, and the result is a bureaucratic environment.

I'd like to read you a small section from Westrum's paper, and it says this. It says, “When bureaucratic organizations need to get information to the right recipient, they are likely to use the standard channels or procedures. These standard channels and procedures are often insufficient in a crisis. They failed, for instance, in communications between New York police and the fire departments on the 11th of September, 2001. The police knew that the World Trade Center north tower was about to collapse, but failed to communicate this to the firefighters inside the tower, many of whom died. By contrast, in the same circumstances, many generative organizations would cross departmental lines or use a back channel to get the information to where it was needed. The Apollo 13 space crisis shows an excellent example of a generative response. And by contrast, the fumbling that led to the demise of the Columbia space shuttle shows bureaucracy at its worst.”

Now, actually, this reminds me of something that Liz mentioned while we were preparing this talk. It's something that [inaudible 00:31:32] says here. She studied psychology, and she said that job satisfaction is correlated with how much control you have of your working environment.

Ince: Jennifer told me about an American sociologist called Robert Karasek, and Robert Karasek looked at employee effectiveness in the workplace. He looked at several measures but primarily workload and work stress. Workload is made up of a number of measures including work rate, the amount of time given to complete a task, the degree of difficulty of the task, and the amount of effort required to complete the task. Now, those are the psychological workplace stressors.

Robert also coined a phrase called decision latitude, and decision latitude is the degree of control or freedom that an individual employee has to manage or organize their own workload. These are all directly correlated. The more control, individual control that an employee has to organize their workload, or the more decision latitude they have, the less stress they feel. The less stress they feel, the more effective they are and therefore the more productive they are.

If we now go on and look at these organizations, again, we can instantly see that the bureaucratic rule orientated organization is very likely not to give employees any freedom at all. They will be dictated to as to how and when they complete their tasks. So, there's very little freedom, very little decision latitude, hence the employees would experience a lot of stress. Because they have high stress, they will have low effectiveness, and the bureaucratic organization will have low productivity.

On the other hand, the generative organization will probably give its employees a lot of freedom. They'll probably be able to decide how and when they complete the tasks that they are supposed to be doing, which lowers their stress, increases their effectiveness, and it is the generative organization that will have a highly productive development team.

Chris reminded us earlier that the business stakeholders are pretty much only interested in business results. For a lot of business stakeholders, if this is a large project or development that's being worked on, then they may well have bet their careers on the business outcome. Saying they bet their careers on the business outcome, we want them to feel confident that we're going to support them. So it's really important that we have the right metrics to give them the right information. We need to give them the information in a way they understand so that they are confident that we can meet the project deadlines, so we can deliver on time and to budget. If we don't give them the information, they will start to worry that we're going to miss the dateline, and because they bet their careers on this, they will then start to worry, and as soon as I start to worry, they will now start to micromanage downwards. And that is not a happy place for any of us to be in.

What to Discuss with Your Boss

On Monday morning, I think it might be time to go in and remind the boss about the human factors. It's fine to put in place measurements and targets, but you must apply discretion and judgment, and just don't blindly follow the rules. I know it sounds a bit trite but it's been proved that things like developer of the week and team of the month really do motivate people. You can also remind your boss that actually people are not as motivated by money as the managers would like to think they are. Finally our unmeasurables - loyalty, team spirit, honesty; they might be unmeasurable but they are not unimportant.

Wilkinson: To summarize, what we are saying is that story points do have their place, but it's to indicate its capacity, and not productivity. They are relevant only to the team in which a velocity is applied that is unique to them alone. Metrics do not equal targets. There is no silver bullet. There is not one metric that can do this alone, but it requires a series of balance metrics that ensue with throughput and quality with the bonus of improving the culture in your organization.

Now think back to the pyramids. We need to be sure that we're reporting the correct statistics up to the business so that they can make the right decisions. Finally, the only one true measure is working software that provides business value. And we've shown you a few things that you could try to improve. So, in the true spirit of fail fast and fail often, please go and give it a go. Thanks.

Questions & Answers

Participant 3: Thanks for the talk. My question is basically in the world of Agile, Kanban, in the word of no story points in big organizations, you have a team of people that need to deliver something together, different teams that need to deliver something together, achieve a project. Let's say rebranding, for example, and then the stakeholder wants to say, "Okay, maybe by March or by April, we are confident that around this time we want deliver this project or something.” They'll have to get some other organization, maybe campaigning or [inaudible 00:37:50] to walk towards that date. So, without some sort of desizing or something like that, how do you know you can actually ...? How do you know when you're going to be able to deliver something, basically?

Wilkinson: How I interpret that was there's a disconnect because if you've got a fixed budget and fixed milestones and you're working in an Agile way, how do you get those things together? Is that right? That's a common problem. Now, the term minimum viable product came around in Agile because the idea is that you want to reach your minimum viable product before your release date. Now, that's the minimum product. And you can estimate that. Obviously, there's a lot of educated guessing occurring within Agile. But the idea is you hit your MVP before that release date.

That means that you just need to be sure that at your current velocities, then you are going to reach it. You're definitely right. I see it all the time, where people use the MVP. When am I going to reach my MVP? Is that going to be on this particular day? They use that as the delivery milestone. But it's teaching that lesson, which I think is the answer to that question. And using your methodology right, whether that be Scrum or Kanban or some search.

Participant 4: Thanks for the talk, really good points. Really agree with what you said about metrics not being the same as targets. I think as you said earlier in the talk though, it's as soon as you measure something, you have an impact. Do you have any chosen recommended strategies for how you can avoid metrics inadvertently becoming targets? Because as soon as managers are measuring something, people are going to think, "Oh, I'll get promoted if I bring my lead times down," Or anything like that.

Ince: Well, that's a really good question, isn't it?

Wilkinson: That's a very good question, yes. I think it's human nature to lean towards stats. People that want to achieve something as a team, and it's difficult not to set those kinds of targets, especially when you're trying to measure something. But it's important to reiterate what the difference is between the metrics and the targets in that sense. I think that if you were to use a series of metrics, like the Accelerate, like in the Accelerate book, then what you're going to see is that whole virtuous cycle. So, one is going to improve the other, and one is going to improve the other. Now, unless they say then, “Right, what I want to see is a low MTTR, I want high deployment frequency, I want to see this, and this,” and they set those ones, then obviously what you've got is a bureaucratic environment, which also you're trying to avoid.

Participant 5: Thanks for your talk. In my experience, story points are usually used as a measure of relative sizing between different features. One question that's come up recently in our organization, which I guess goes in the direction of what you were saying about you can't use them for comparison between different teams, is whether or not you could use some form of algorithmic estimation system similar to what was used as function points back in the 1970s, to try and standardize story points rather than use them as a measure of relative size. Does the question make sense?

Ince: I was going to say I don't think function points worked very well back then, did they really?

Wilkinson: No. I think it's important. Obviously, you could do some kind of statistics around the points values themselves by recording your velocity, recording the size of all the stories. But when you're sizing a story, that is essentially a matter of opinion for who's in your ceremony at that time. And, again, it's against the designated baseline story. So, did the numbers really mean anything at that point if that's what you're recording and you're creating the averages on?

Participant 5: I think you're referring to the estimation by expert judgment or relative sizing, essentially taking a team of experts and saying, "We understand that we've previously given an estimate on a particular scale, usually within an order of magnitude. So, we're going to compare the next feature to the previous features in order to determine the size." That's my experience as well. I just wondered whether or not you'd seen a positive use case for an algorithmic approach to producing story points, or some similar estimation mechanism or measure within an Agile context?

Wilkinson: I've never seen something where it can be produced automatically. I think this is the thinking of that principle of continuous delivery where humans are to solve problems and computers solve repetitive tasks. I think this is just one of those problems that we've got to do. If you go from a series of acceptance criteria and you're trying to translate that into the complexity of the task for the development teams, there's definitely a human element there. I don't think that could be predicted. But it would be interesting to do some numbers.

See more presentations with transcripts

Recorded at:

Apr 18, 2019

InfoQ Software Architects' Newsletter