#NoEstimates Project Planning Using Monte Carlo Simulation
Customers come to us with a new product idea and they always ask the questions - how long will it take and how much will it cost us to deliver? Reality is uncertain, yet we as software developers are expected to deliver new products with certainty.
To increase the chances of project success we need to incorporate the uncertainty in our planning and exploit it. We can’t control the Waves of Uncertainty, but we can learn How to Surf! We do that by planning using reference class forecasting which promises more accuracy in forecasts by taking an "outside view" on the project being forecasted based on knowledge about actual performance in a reference class of comparable projects. This approach aligns with the #NoEstimates paradigm which aims at “exploring alternatives to estimates [of time, effort, cost] for making decisions in software development” (Zuill, 2013). To me #NoEstimates means “No effort estimates” which stands both for “Effortless estimates” or estimating with minimal effort and for “Not using estimates of effort”.
Deterministic planning used these days forces certainty on uncertain situations and masks the uncertainty instead of highlighting it. It calculates the project-specific costs based on a detailed study of the resources required to accomplish each activity of work contained in the project’s work breakdown structure or in other words, taking an “inside view” on the project being estimated. For high-level planning, deterministic estimation of all work items is wasteful of people’s time and infers precision when it isn’t present.
The techniques presented here are fast and for most of the projects they will produce more accurate results.
High-level probabilistic planning
Present day’s project management paradigm is based on the 1st principle of Scientific Management namely “In principle it is possible to know all you need to know to be able to plan what to do” (Taylor, 2006). It does recognize that uncertainty play a role in project management but believes uncertainty could be eliminated by a more detailed planning. It models projects as a network of activities and calculates the time needed to deliver a project by estimating the effort required to accomplish each activity of work contained in the project’s work breakdown structure (PMI, 2009).
We argue that planners could not know everything they needed to know and that the world as such is uncertain and every number is a random variable. We challenge the project management paradigm and suggest that for planning purposes it is better to model projects as a flow of work items through a system.
Hence the definition - a project is a batch of work items each one representing independent customer value that must be delivered on or before due date. The batch contains all the work items that need to be completed to deliver a new product with specified capabilities. In order to prepare the batch the product scope needs to be broken down into work items each one representing independent customer value. Even for a quality related requirements such as “the system should scale horizontally” we need to have a work item. It is important that each one of the work items can be delivered in any order like the user stories created following the INVEST mnemonic. We don’t try to estimate the size of the work items. There are only two "sizes" - "small enough" and "too big". The two sizes are context specific. They have no correlation to the "effort" needed. "Too big" should be split and not allowed to enter the backlog.
Probabilistic high-level plan forecasts the initial budget and also the range of the time frame for a project. We don’t plan in detail what is not absolutely necessary to plan. The short-term details, like the scheduling, are done based on the immediate needs and capabilities – and we create these schedules upon the execution of the high-level plan. When executing the high-level plan we have to keep focus on the project intent but we can never be certain which paths will offer the best chances of realizing it. We exploit uncertainty by making a series of small choices which open up further options then observe the effects of our actions and exploit unexpected successes.
We plan probabilistically by using reference class forecasting which does not try to forecast the specific uncertain events that could affect the new project, but instead places the project in a statistical distribution of outcomes from a class of reference projects.
Reference class forecasting
Reference Class Forecasting is based on the work of the Princeton’s psychologist Daniel Kahneman who won the Nobel Prize in economics in 2002.
Reference class forecasting for a particular project requires the following three steps (Flyvbjerg, 2007):
- Identification of a relevant reference class of past, similar projects. The class must be broad enough to be statistically meaningful but narrow enough to be comparable with the specific project.
- Establishing a probability distribution for the selected reference class. This requires access to credible, empirical data for a sufficient number of projects within the reference class to make statistically meaningful conclusions.
- Comparing the new project with the reference class distribution, in order to establish the most likely outcome for the new project.
Let’s apply the reference class forecasting method for forecasting the delivery time for a new project.
Identification of a relevant reference class
The projects in the reference class should have comparable:
- Team structures
- Technologies used
- Development processes used and the methods of capturing the requirements
- Client types
- Business domains
Please note that along with the internal characteristics of the projects we also compare the contexts in which projects were executed. The same team may have different performance if the client is a startup or Fortune 500 corporation due to the different way of collaboration with the stakeholders. On the other hand when comparing the projects we should not go into great details. Our goal is to establish a reference class that is broad enough to be statistically meaningful but narrow enough to be comparable with the new projects we will be working on.
Establishing a probability distribution for the selected reference class
We need to decide the metric for which we will establish the probability distribution. The metric should allow for taking an outside view on the development system that worked on the project, allow for calculating delivery time and should also make sense from client’s perspective. Takt Time is such a metric. Takt Time is the rate at which a finished product needs to be completed in order to meet customer demand. It is defined as the ratio of the available production time divided by customer demand. In other words Takt Time is the average time between two successive deliveries to the customer.
In what units of time we measure Takt Time? In manufacturing they measure Takt Time in hours, minutes even in seconds for the mass production. In knowledge work we measure Takt Time in days.
Here is a diagram presenting the delivery rate for a fictitious project. Each yellow box represents a work item delivered to the customer. Here we are again taking an outside view and are not interested in the “size” of each work item or in the way development system works internally.
On the left we have the start date for the project and on the right we have the end date. We can see that five days after the project started the first work item was delivered. Its Takt Time is 5 days. Seven days after that two new work items were delivered. Now what is their Takt Time? The first work item has a Takt Time of seven days, but the second one has a Takt Time of zero days. That is because the time between the two work items is zero days. It is not zero minutes but since we measure Takt Time in days it is zero days. Two days after that three new work items were delivered. According to the definition of Takt Time one of them has Takt Time of two days but the other two work items both have Takt Time of zero days. And we see how it went – eventually all 10 work items were delivered.
Important thing to note here is that the sum of all Takt Time values equals the delivery time of the project – in this case 22 days.
Here is a histogram of the Takt Time for the above delivery rate. Note the number of work items with Takt Time of zero.
Takt Time is calculated by dividing the time T over which the project is or will be delivered by the number of work items delivered.
- T is the time period over which the project will be delivered
- N is the number of items to be delivered in [0,T]
- is the Takt Time
In our project we have 22 days delivery time and we have 10 stories delivered hence we have Takt Time of 2.2 days.
That means that on average the time between two successive deliveries is 2.2 days. Note that it is an unqualified average, a single number without variance.
If we know the Takt Time for the system and we have a number of N work items to be delivered we can calculate how much time will take the system to deliver all N work items. The formula is Takt Time times the number of work items to be delivered.
For instance if we have to deliver 45 stories and the Takt Time is 2.2 days then it will take the system 99 days to deliver.
Here comes an important point – because Takt Time is the average value of a random variable then we have a chance of missing the forecast. To get better odds, we need to use the probability distribution of Takt Time.
We usually don’t know how the Takt Time is distributed. How could we find that out?
Using historical samples via bootstrapping we can infer the distribution of Takt Time and its likelihoods.
Bootstrapping is based on the assumption that the sample is a good representation of the unknown population. Bootstrapping is done by repeatedly re-sampling a dataset with replacement, calculating the statistic of interest and recording its distribution. It does not replace or add to the original data.
In this case the statistic of interest is the average time between two successive deliveries or Takt Time.
Bootstrapping the distribution of Takt Time
Now let’s see how bootstrapping can be applied for inferring the Takt Time distribution. The steps are as follows:
- Have Takt Time (TT) sample of size n
- Have the number of work items delivered (N)
- Draw the same number of observation as the sample size n TTi with replacement out of the sample from step 1
- Calculate Project Delivery time (T) for the sample from step 3 using T = ∑TTi
- Calculate Takt Time using T from step 4 and N from step 2
- Repeat many times
- Prepare distribution for Takt Time
Here is the method applied using the Takt Time data for out fictitious project.
And here is the Takt Time histogram bootstrapped using data from the fictitious project. Note the Median, Mean and the 85th percentile.
Now we have the probability distribution of Takt Time for our fictitious project. Note that the average value of 2.26 is very close to the Takt Time we calculated initially. Now we have not only the average but also the mode, the median and the percentiles. This Takt Time distribution represents a context specific uncertainty and is unique per context (team structure, delivery process used, technology, business domain and client type). This distribution should be preserved in a library of reference classes to be used for forecasting new projects implemented in the same context. By using it both theoretical knowledge and effort are greatly reduced, facilitating the use of probabilistic modeling. This distribution will be invalidated if any of the following is changed: team structure, development process, technology being used, client type and business domain.
Comparing the new project with the reference class distribution
Important thing to note is that T = NTT assumes linear delivery rate. Do projects have linear delivery rate? Not really. This is a diagram that visualizes the rate at which the work items were delivered from a real project.
(Click on the image to enlarge it)
On the X axis we have the project time in days. On the Y axis we have the number of work items delivered each day. It turns out that the delivery rate follows a “Z-curve pattern” (Anderson, 2003) as visualized by the red line.
The Z-curve can be divided in three parts or we can say it has three legs. There is empirical evidence that 20% of the time the delivery rate will be slow. Then for 60% of the time we’ll go faster or it’s “the hyper productivity” period. And for 20% till the end we’ll go slowly. Of course numbers may vary depending on the context but the basic principle about the three sections is correct.
Each leg of the Z-curve is characterized by:
- Different work type
- Different level of variation
- Different staffing in terms of headcount and level of expertise
Only the second Z-curve leg is representative for the system capability. It shows the common cause variation specific to each system. First and third Z-curve legs are project specific and are affected by special cause variation.
The first leg of the Z-curve is the time when the developers climb the learning curve and setup their minds for the new project. But this leg of the Z-curve could also be used for:
- conducting experiments to cover the riskiest work items
- setting up environments
- adapting to client’s culture and procedures
- understanding new business domain
- mastering new technology
All above are examples of special causes of variation specific to a project.
The second leg of the Z-curve is the productivity period. If the project is scheduled properly the system should be like clockwork – sustainable pace, no stress, no surprises…
The third leg of the Z-curve is when the team will clean up the battlefield, fix some outstanding defects and support the transition of the project deliverable into operation.
Project delivery time T
Project delivery time can be represented as the sum of the durations of each one of the three legs of the Z-curve. In other words it equals the duration Tz1 of the 1st leg plus the duration Tz2 of the 2nd leg plus the duration Tz3 of the 3rd leg of the Z-curve.
T = Tz1 + Tz2 + Tz3
Let’s substitute the duration of each of the three legs with the formula T = NTT. Now we have a new formula that, if we know the Takt Time and the number of work items to be delivered during each of the three legs of the Z-curve, will allow us to calculate how much time will take the system to deliver all N work items where N = Nz1 + Nz2 + Nz3 is the total number of work items for the project.
T = Nz1TTz1 + Nz2TTz2 + NzTTz3
Here we are calculating the delivery of Nz1 work items with Takt Time during the 1st leg of the Z-curve plus Nz2 work items with Takt Time during the 2nd leg of the Z-curve and Nz3 work items with Takt Time during the 3rd leg of the Z-curve. This calculation is not credible because it is using Takt Time as a single number and we know we should use a distribution of the Takt Time instead. We need distributions of Takt Time for each one of the three legs of the Z-curve. We already know how to do that using bootstrap. Then we have to sum them but by definition they are random variables. How could we sum up random variables? Here comes Monte Carlo analysis. Monte Carlo simulation is a tool for summing up random variables(Savage, 2012).
Monte Carlo simulation of Project Delivery Time (T) based on Z-curve
The steps are as follows:
- Have three Takt Time distributions ()each one of size n for each of the three legs of the Z-curve
- Have the number of work items to be delivered in each of the three legs of the Z-curve (Nz1, Nz2, Nz3) where N = Nz1 + Nz2+ Nz3
- Draw one observation out of the n, with replacement from each of ()
- Calculate Project Delivery time (T) for the sample from step 3 using T = Nz1TTz1 + Nz2TTz2 + Nz3TTz3
- Repeat many times
- Prepare Delivery time (T) probability distribution
Let’s see how we can apply the above algorithm using some real data.
Let’s have a new project that we have to plan and provide the customer with delivery date. We already have a reference class of projects and when we compare the new project with the reference class we see that the new project is for the same customer, the same team will be working on it, using the same technology. For the reference class we also have the Takt Time distributions for each of the three legs of the Z-curve.
(Click on the image to enlarge it)
After some analysis the team has broken down the new project scope into user stories and then has added some more work items to account for Dark Matter and Failure Load. After that the team decided that 12 stories will be delivered by the 1st leg of the Z-curve, 70 stories will be delivered by the 2nd leg of the Z-curve and 18 stories or work items will be delivered by the 3rd leg of the Z-curve.
If we visualize the Takt Time values using their respective distributions then the Monte Carlo simulated summation of…
(Click on the image to enlarge it)
…will give us the time needed to deliver the project!
We are simulating this summation say 50,000 times. That will give us the simulated time needed to deliver the new project.
We end up with a histogram of the projected delivery time for our new project. We are interested in the Median, Average and the 85th percentile of the project delivery time (T) and the shape of the distribution. Based on the Projected Delivery Time histogram we can take the 85th percentile and use it as single number.
For this project the 85th percentile is 90 days. So 6 times out of 7 we should have the project delivered in 90 days or less.
By taking an outside view when forecasting a new project we will produce more accurate results faster than using the deterministic inside view. The method presented can be used by any team that uses user stories for planning and tracking project execution no matter the development process used (Scrum, XP, kanban systems).
My hope is that you will start using the techniques presented here for planning your next project. And don’t forget that even if we can’t control the waves of uncertainty we can learn how to surf!
Anderson, D. J. (2003). Agile Management for Software Engineering: Applying the Theory of Constraints for Business Results. Prentice Hall.
Flyvbjerg, B. (2007). Eliminating Bias in Early Project Development through Reference Class Forecasting and Good Governance. Trondheim, Norway: Concept Program, The Norwegian University of Science and Technology.
PMI. (2009). A Guide to the Project Management Body of Knowledge. Project Management Institute.
Savage, S. L. (2012). The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty. Wiley.
Taylor, F. W. (2006). The Principles of Scientific Management. Cosimo Classics.
Zuill, W. (2013, May 17). The NoEstimates Hashtag.
About the Author
Dimitar Bakardzhiev is the Managing Director of Taller Technologies Bulgaria and an expert in driving successful and cost-effective technology development. As a LKU Accredited Kanban Trainer (AKT) Dimitar puts lean principles to work every day when managing complex software projects. Dimitar has been one of the evangelists of Kanban in Bulgaria and has published David Anderson’s Kanban book as well as books by Goldratt and Deming in the local language.
Most Likley Not Average
The Most Likely is the number that appears most often in the simulation, the mode. That's the number that "most likley" will appear in the future when the work is being done.
Re: Most Likley Not Average
Reminded of this Joel on Software article
Re: Reminded of this Joel on Software article
"We don’t try to estimate the size of the work items. There are only two "sizes" - "small enough" and "too big". The two sizes are context specific. They have no correlation to the "effort" needed. "Too big" should be split and not allowed to enter the backlog."
Most Likely and Average
This is why any number with the next level moment is not reliable for making decisions.
And justy another reminder, that "boot strapping" is OK, but a bit of a stretch to be called Monte Carlo Simulation. Boot Strapping is a simulation method used most often in the Bio and Environmental sciences where there is no underlying probability distribution that can be used for a generating function. Those domains have at not stochastic processes in the same way project work is.
In your examples, the sample sizes are very small, the population sample you're drawing your "boot strap" value from. This is typical in environment sciences and some bio chem domains.
So MCS has essentially been relabeled in your paper. Those of us who earn our living doing MCS of large complex programs using:
Ww would not interchange those two terms (Boot Strap and MC), so your readers who apply the tools listed above in Enterprise IT, manufacturing, or other engineering disciplines, will see that straight away.
The paper moves the conversation forward though, so good work.
Re: Reminded of this Joel on Software article
Re: Most Likely and Average
Re: Most Likely and Average
And Monte Carlo is a tool for summing up random variables (savage) is not mathematically correct. It is NOT a summing of random variables. Please explore some the reference I've sent on the underlying processes of MCS, beyond the populist view provided by Savage.
Re: Most Likely and Average
Re: Misuse of MCS terms
The empirical distribution is no longer a simulation, it is empirical - a closed form. MCS is used when there is no closed form for the calculations.
You created the distribution from past samples - empirical samples - then resampled those samples. That is bootstrapping. You've created the PDF from samples then resampled. That's not the mathematics of MCS. www.goldsim.com/Web/Introduction/Probabilistic/... is an example of a tool we use, where the dice example you've mentioned "turns out to be" a Gausian distirbution by modeling with MCS the equal probability of the values resulting from the throw of two dice. But the Gaussian PDF was no there first, it is the result of the modeling, not the source of th modeling.
In bootstrapping, you (and those in bio and ecology) produce the PDF then re-sample to apply to a model.
you're redefining the meaning and uses of MCS. Creating your own source of criticism for yoru approach in the process of redefining and actually misusing the term MCS. This is a weak mathematical approach that undermines your desired message with anyone familiar with MCS.
Re: Misuse of MCS terms
"After having explored the above characteristics of the variable, the risk assessor has three basic techniques for representing the data in the analysis. In the first method, the assessor can attempt to fit a theoretical or parametric distribution to the data using standard statistical techniques. As a second option, the assessor can use the data to define an empirical distribution function (EDF). Finally, the assessor can use the data directly in the analysis utilizing random resampling techniques (i.e., bootstrapping). Each of these three techniques has its own benefits.However, there is no consensus among researchers (authors) as to which method is generally superior. For example, Law and Kelton (1991) observe that EDFs may contain irregularities, especially when the data are limited and that when an EDF is used in the typical manner, values outside the range of the observed data cannot be generated. Consequently, when the data are representative of the exposure variable and the fit is good, some prefer to use parametric distributions. On the other hand, some authors prefer EDFs (Bratley, Fox and Schrage, 1987) arguing that the smoothing which necessarily takes place in the fitting process distorts real
information. In addition, when data are limited, accurate estimation of the upper end (tail) is difficult. Ultimately, the technique selected will be a matter of the risk."
Reference class forecasting and the outside view
I actually see #noestimates and probabilistic forecasting as orthogonal.
I am curious about the following:
1. You state that projects in the reference class should have comparable:
•Development processes used and the methods of capturing the requirements
Are these typical references class characteristics based on your experience? how were these features determined?
2. According to the original outside view work (Kahneman and Tversky, Lovallo and Kahneman) and more recent work by Flyvbjerg, the outside view should enable us to check and correct for the forecaster's track record. If we are always underestimating costs, we should adjust upwards, and if we always underestimate duration, we should adjust upwards. Where in your process do you account for historical reliability and accuracy of forecasts? Otherwise, the method described seems to be beneficial since it produces a probability dist for the metric of interest, but it does not necessarily adjust for any forecasting biases or errors.
Re: Reference class forecasting and the outside view
Which leads me to your questions because I will answer both from "outside view" perspective. To answer your first question - yes all reference class project characteristics are based on my experience. I have no statistically derived correlations to report. Please note that all characteristics are "outside view". But defining what the reference class should be is hard as Venn put it "It is obvious that every individual thing or event has an indefinite number of properties or attributes observable in it, and might therefore be considered as belonging to an indefinite number of different classes of things…"" Hence the characteristics that I use are those that I deem important in my context which is software development.
To answer your second question - what I propose is following the work of Flyvbjerg which is to examine the experiences of a class of similar projects, lay out a Takt Time distribution for this reference class, and then position the current project in that distribution.
But there is a difference as you rightly pointed out. I don't propose to correct the forecast based on the "historical reliability and accuracy of forecasts". Flyvbjerg's perspective seems to be that if we forecasted a project to finish in 12 months but actually the project finished in 14 months next time we should adjust upwards because our last forecast is considered unreliable.
My perspective is that forecasts are unreliable by definition. The forecast is just the higher-level plan. How we execute and manage the project is more important than the high-level plan. As I put it in the article "The short-term details, like the scheduling, are done based on the immediate needs and capabilities – and we create these schedules upon the execution of the high-level plan. When executing the high-level plan we have to keep focus on the project intent but we can never be certain which paths will offer the best chances of realizing it. We exploit uncertainty by making a series of small choices which open up further options then observe the effects of our actions and exploit unexpected successes."
The execution is very well covered in the best Agile and Lean books. I personally prefer to use Kanban Method for the execution and management of a project. I also use buffer management as I present here www.slideshare.net/dimiterbak/project-planning-... on slides 24, 25,26,27, 28. And here is the text behind the slides www.scribd.com/doc/63782778/Little-Law
The probabilistic forecasting approach I present in my article is just a tool, nothing more. People should use this "outside view" tool along with other tools that use "inside view".