InfoQ Homepage Articles Size Estimation Approaches for Use with Agile Methods

Size Estimation Approaches for Use with Agile Methods

Feb 06, 2017 23 min read

InfoQ Article Contest

Share your knowledge Win a ticket to a QCon event
or an InfoQ Dev SummitFind out more

Key Takeaways

Software size is needed for estimating and measurement
Five software sizing methods can be used
User stories/points is the most popular sizing measure for sprints
Function points is most popular measure at the agile project level
There is no consensus over which sizing method is best

There is strong agreement among software practitioners that estimates to bound the resources needed to successfully complete development projects, agile included. While some controversy exists within the agile community over whether such estimates are needed for sprints or iterations, many agree that they are needed at the project level and higher for the following purposes:

To scope the time and effort needed to successfully deliver quality software products.
To assess the risk associated with such estimates as a function of project scope and variation.
To assess the feasibility of delivering working software per such estimates based on the team’s, firm’s and/or industry’s past experience.
To assess the make versus buy tradeoffs, including those associated with off-shoring and/or out-sourcing part or all of the work involved in the software delivery.
To assess the cost of software quality tradeoffs and their impact on product delivery.
To determine the relative scope of software deliveries in terms of size (user stories, function points, etc.) when schedules and team size are fixed, i.e., design to cost.
To determine whether the “offerors” responding to a software solicitation can deliver the desired scope as promised on time and within negotiated budgets.
To assess the iron triangle tradeoffs associated with varying scope, schedule and/or effort as the software effort progresses.
To determine whether it is feasible to deliver the software with the agreed functionality within the time remaining using allocated resources, i.e., develop an estimate-to-complete.
To assess whether or not the project was successful, i.e., delivering the scope promised on schedule and budget.
To assess the value of software delivered via agile methods, i.e., the return-on-investment or cost/benefits accrued.
To assess the technical debt associated with software delivered via agile methods, i.e., the cost to fix software defects delivered as part of the product.

It is important to recognize that such estimates are driven by the size of the job which can be represented by a variety of related metrics (function points, user stories/story points, etc.). The purpose of this article is to identify the most popular agile size metrics and their relative strengths and weaknesses from a user point-of-view. To perform this assessment, we conducted a fact-finding survey on the topic to which 112 practitioners responded. This article summarizes our conclusions. For those interested, the report that documents our full findings is available on our web site at http://www.reifer.com/products.

What Does It Mean to be Agile?

To begin, it is important to say that when we say agile we convey the meaning that those professing to be agile have put the principles of the Agile Manifesto ¹ into practice. Per this definition, these groups include those who use methods like the Extreme Programming (XP)², Agile Unified Process (AUP)³, Scaled Agile Framework (SAFe)⁴, Scrum⁵, Scrum of Scrums⁶, and/or other techniques including hybrids that may embrace Kanban⁷, lean⁸ and/or traditional plan-driven⁹ development approaches to develop their software products. The size of such projects is important because it can directly influence the selection of the agile method used. For example, as shown in Figure 1, Scrum is used for small to medium-size projects because of this, while the other methods portrayed were primarily employed on larger agile at-scale projects¹⁰.

Figure 1: Agile Methodology Usage by No. of Organizations by Size of Project

Notes

Small: project that delivers a product that can be developed by a single agile team.
Medium: agile project that uses 2 to 5 teams at same locations to develop products.
At Scale: large project that uses 5 or more teams sometimes at different locations to develop products.

Table 1 identifies how the methods identified by our fact-finding survey are currently being used across applications domains by forty large organizations¹². The heavy use of hybrids for agile at-scale developments seems driven by desires on the part of these organizations to wed agile methods with the management infrastructure (policies, processes, people, contracts, etc.) that they use to manage operating units across the organization

Application Domain	AUP	SAFe	SOS¹	Hybrid Agile/Traditional	Hybrid Agile /Lean/Traditional
Automation	1			2	2
Defense		2	1	6	1
Financial/Banking			1	4
Information Technology	1	4	2	1	2
Telecommunications	1	3	1		1
Web Business		1	1	1	1
TOTALS	3	10	6	14	7

Table 1: Number of Enterprises Using Primary Methods by Application Domain

Based on further investigations, Scrum-of-Scrums was most heavily used by those firms that had embraced Scrum as their first agile methodology and then fanned it out enterprise-wide. AUP, SAFe, and hybrid methods were brought in by firms primarily as a means to deal with issues associated with agile-at-scale developments when fan-out was not the primary consideration.

Survey Methodology

To conduct the survey, we used the approach shown in Figure 2 and briefly explained as follows:

Figure 2: Five Step Survey Approach

Step 1: Data Gathering

To begin the process, we developed a questionnaire. In parallel, we solidified agreements with contacts from firms to participate. After several rounds of interviews, both the goals of the effort and the questionnaire to be used were finalized with our stakeholders. Then, after a period of about a month, we started to receive data from participants.

Step 2: Review and Validate Data

Once the data was received, we reviewed responses for omissions and mistakes. Because those participating had helped to develop the questionnaire, the data received was mostly good. However, our review identified some problems which we quickly helped participants correct via phone calls, telecons and site visits. Once finalized, we tallied and combined the data in an Excel database where the entries were code-named to protect the identity of respondents.

Steps 3 and 4: Test the Resulting Analysis Databases and Develop Findings

We next binned the data and then iteratively checked the resulting data sets for completeness, inconsistencies and validity. Binning was done by influence factors, i.e., agile method used, applications domains and project size. The applications domains we used for this purpose ranged from defense to Information Technology (IT) to telecommunications. Finally, we used past agile study results which we and others had published about this topic^12,13,14 to check the validity and reasonableness of the findings.

Step 5: Publish the Findings

As our final step, we published the findings. To ensure that we did not misrepresent the fact, we first circulated a draft and then asked several agile subject matter experts to review the document. This article is a synopsis of this report which again is available on our web site.

Sizing Methods

The five major sizing methods that are being used by the survey participants included (1) sizing by analogy, (2) function points, (3) Halstead vocabulary, (4) proxies and (5) user stories/story points. The two methods most used were sizing by analogy and function points. Proxies and stories/story points were next in popularity because of their relationship with Scrum. Halstead vocabulary was judged to hold promise. In addition, combinations of methods were used by some organizations, mostly for larger projects. For example, story points were used at the team level to compute velocity, while function points were used by projects for estimating and other purposes.

A summary of each of these methods along with examples of use and a summary of major strengths and weaknesses follows:

Sizing by Analogy – To use this method, several representative software packages for which size information exists are selected to serve as references. Those developing the estimates then adjust the size estimates for the new packages based on the similarities and differences relative to size, difficulty and other salient characteristics of the program being analyzed. As an example, those developing operating system software may compare the packages they are developing to existing schedulers, dispatchers, memory managers, command managers, and utility packages, to name a few, for which they have actual size data. For instance, the scheduler may be sized at one hundred function points + 50% to address the risks associated with its difficulty and the complexities of its real-time domain.

Major Strengths

Easy to do if you have a representative sizing database that can be used for comparisons.
Methodology-independent and can be used for applications of any size.
Can be based on user stories, epics or other agile requirements documents.

Major Weaknesses

Only as good as the examples in your database and their characterizations.

Function Points ¹⁵ – Function points are a functional measure of software size developed to reflect the smallest unit of activity understood by a user or customer. Function points define size in terms of well-defined characteristics of a software deliverable. As Figure 3 shows, size predictions are based on the number of external inputs, external outputs, external inquiries, external interface files, internal logical files and other pertinent parameters. Using requirements as their basis, each of these parameters can be estimated using standardized counting conventions developed for that purpose. Function points can then be calculated using standard counting conventions and formulas¹⁶ developed for that purpose.

Figure 3: Function Point Counting Parameters

As an example, you could use the following information to easily develop a size estimate in function points for a flight check-in application using the formulas developed for this purpose:

One external query (can I check-in)
One external input (flight information fact sheet)
Two external outputs (check-in status alert and possible error condition)
One external interface file (flight database)

Some useful guidelines for counting function points on agile projects have been developed and are readily available on the web ^17,18.

Major Strengths

Well-defined, mature approach controlled by an international standardization group.
Methodology-independent and can be used for applications of any size.
Can be based on user stories, epics or other agile requirements documents.
Industry data and benchmarks are available for productivity, cost and quality ^{19, 20}
Counting conventions are available and there are certification requirements for counters.
Can change accuracy of estimates using counters that are available to count the actual size in delivered software; i.e., can refine estimating process based on actual experience.

Major Weaknesses

Practices advocated for use for sizing user stories using function points are relatively new and unproven.
There are several standards for function points. In response, you need to identify which one is being used.
Counting can be time-consuming and you will need to be trained in order to do it right.

Halstead Vocabulary²¹ - Vocabulary in the Halstead sense is used to express size by counting the number of unique number of operands and operators used to express a program. The relative difficulty of how hard it is to understand the program when reading or writing it is expressed by the Halstead complexity measure. Once these metrics are computed, you can use vocabulary size to estimate the effort and the time it would take to develop the program. For agile projects, unique “nouns (payroll)” and “predicates (verb-subject pairs like process payroll)” can be used to represent “operands” and “operators” in agile stories. The resulting average operand and operator counts can then be converted to Unadjusted Function Point (UFP) counts using “small,” “medium” and “large” reference stories. The advantage to this approach is that counting can be easily automated using a standard text editor, which can be configured for that purpose. The three such reference stories devised for this purpose include:
Small reference story - “nouns” and “predicate” counts within a range of two to five; has a UFP count ranging from two to twelve across all applications domains based on a Halstead rating adjustment for difficulty of “easy,” “moderate,” or “hard.”
Medium reference story - “nouns” and “predicate” counts within a range of 6 to 10; has a UFP count ranging from 13 to 25 across all applications domains based on a Halstead rating adjustments for difficulty of understanding of “easy,” “moderate,” or “hard.”
Large reference story - “nouns” and “predicate” counts within a range of 11 to 15; has a UFP count ranging from 26 to 50 based on a Halstead rating adjustments for difficulty of understanding of “easy,” “moderate,” or “hard.”

Major Strengths

Easy to use and can be automated using a text editor.
Methodology independent and can be used for applications of any size.
Can be based on user stories, epics or other agile requirements documents.

Major Weaknesses

Relatively new and unproven.
Relevance of Halstead numbers difficult to understand because they are different from others used within software circles.
Proxies – Using proxies is a form of sizing by analogy. A variety of standard software packages called proxies could be used instead of actual software for sizing purposes. The advantage of proxies is that they can be developed for both future as well as commonly developed existing applications for which a firm has actual data and histories. They can also be developed to address requirements for which you may have limited data, like governance and oversight regulations in medical and financial applications. As an example, you are sizing a new security requirement placed on all of your applications whose impact must be factored into your annual budget within a short timeframe. Instead of pondering the impact on each application, you develop a proxy that assesses the range of size variation parametrically. To estimate the budget impact, you then use the proxy to estimate the impact of the requirement on your applications.

Major Strengths

Can be used for both existing and future applications for which you have limited data.
Can be characterized for use with software cost estimating and other models which you may use to predict cost and quality.

Major Weaknesses

Only as good as the proxies developed.
Does not work if the proxies are not defined in such a way to be representative of the application being developed.
User Stories/Story Points²² – For those using the Scrum methodology, user stories or story points are employed to specify the high level requirements for the application. In contrast, story points are used as a relative measure of the work required to implement the story (including any backlogged features) by a Scrum team independent of other efforts that are going on in parallel. Important related terms include:
Story – a description of desired functionality or features that need to be developed told from a user or customer viewpoint.
Theme – a collection of related stories.
Epic – a large user story that can be decomposed and often takes weeks to deliver.
Story Points – a relative measure of the size of a story developed by the Scrum team for their own use based on their experience. Various procedures are used to develop the story point estimates including the use of planning poker. These techniques can use all sorts of techniques that make the exercises fun for the team, including 3X5 cards and t-shirts. A good article describing these sizing practices can be found²³.

As an example, your team of five based on their past experience planned to develop ten stories per sprint. However, they were able to develop only six during the first sprint and eight during the second one. Based on their current pace, you feel confident that they can realize their goal of ten stories on the third sprint and exceed it on the fourth one.

Major Strengths

Easy and fun to do.
Develops a relative size for a specific application being developed by a single team that can be used to determine velocity and other useful agile metrics and measures.

Major Weaknesses

Story points were developed to work with Scrum. However, they can and have been used for sizing with other methods.
Story points are a relative measure of size developed by one team that is almost never true for other teams working on the same project.
Lack of standardization makes each use of the approach unique and not standard.

Guidelines for sizing with stories and/or story points with scaled agile methods like AUP²⁴, Large Scale Scrum (LeSS) ²⁵, and Scaled Agile Framework (SAFe)²⁶ also exist.

Combinations – As mentioned, combinations of these sizing methods are often used on mostly large projects to size the job at the team and project levels. While teams frequently employ user stories/story points during development for internal purposes, projects often use function points, sizing by analogy, proxies and Halstead vocabulary to support project management tasks. Projects may even use Source Lines of Code (SLOC) as their sizing measure especially when their processes, histories and models have been calibrated accordingly.

Major Strengths

Can compensate for weaknesses in the method especially when they are used appropriately.
Can be used with existing software cost estimating and other models to predict cost, schedule and quality.

Major Weaknesses

Methods may be incompatible with one another. For example, stories/story points are a relative measure of size, while function points rely on empirical data and analysis.
Consistency across organizations is often a problem because of lack of appropriate counting standards.

Conclusions

The five sizing methods reviewed included (1) sizing by analogy, (2) function points, (3) Halstead vocabulary, (4) proxies and (5) user stories/story points. The two methods that were most heavily used by participants were sizing by analogy and function points. Proxies and stories/story points were next in popularity. Halstead vocabulary, being relatively new, was judged by evaluators to hold promise. In addition, combinations of sizing methods including hybrids can be used on larger agile projects to take advantage of each of their strengths and compensate for their weaknesses when there is a fit; i.e., results can be shared between them.

There was no consensus over which method is best. All of the methods including combinations have fans and detractors and were deemed useful. Each has been used effectively by survey participants. Each has strengths and weaknesses. Each can be used effectively when used singly or in combination with others appropriately by those who understand its fitness for use. However, managers seemed to prefer function points because sizing with them was repeatable because the method was rule-based and there are historical databases and benchmarks available.

Table 2 provides an overall assessment of the five primary sizing methods. We used the following nine criteria to perform the rating:

Foundation – what is the underlying basis for the method?
Rule-based – are there rules that must be adhered to?
Methodology-based – is the method an integral part of an agile methodology?
Standardization – who sets the rules or guidelines for use of the method?
Representative – at what level in the organization are the outputs related to?
Repeatable – can results be repeated using different people and, if so, at what level?
Accurate – are sizing estimates accurate when compared to actual sizes?
Useful – are sizing estimates useful and, if so, for what purpose?
Industry Data Available – is industry data on relative size readily available?

Method/ Criteria	Sizing by Analogy	Function Points	Halstead Vocabulary	Proxies	User Stories/ Story Points
Estimates Size	Yes	Yes	Yes	Yes	Yes
Foundation	Comparative	Empirical	Mathematical	Comparative	Relative
Rule-based	No	Yes	Yes	No	No
Method-based	No	No	No	No	Yes - Scrum
Standardization	Enterprise	ISO²⁷ and IFPUG^{28, 29}	Enterprise	Enterprise	Team
Representative	Project	Enterprise	Enterprise	Project	Team
Repeatable	Project	Enterprise	Enterprise	Enterprise	Team
Accurate	Variable	Variable	Variable	Variable	Variable
Useful	Yes – Project Metrics	Yes- Enterprise Metrics Program	Too Early to Determine	Yes- Project Metrics	Yes -Team Metrics
Industry Data Available³	No	Yes	No	No	No

Table 2: Assessment of Agile Sizing Methods

Selection of the best method is a function of the weighting assigned to each of these assessment criteria. For example, function points received the winning nod when evaluators gave preferences to repeatability, standardization, rule-based sizing methods. In contrast, stories/story points won out when the use of methodology-based approaches like Scrum were emphasized. On large projects, combinations of methods received the winning nods when those asked about them rendered their opinions. For example, stories/story points were used at the team, while function points or proxies were used at the project-level for sizing purposes.

When asked about the accuracy of size estimates each method generates, Table 2 concludes that there is a lot of variability in results. As with traditional developments, the major cause of such variations seems to be requirements changes. For agile development, stories are used to capture the requirements. Because agile views requirements elicitation as an exploration rather than a specification activity, they continuously change as development progresses and as users learn what they truly want the software to do. As a result, stories are added and deleted sprint by sprint. Agile projects maintain budget and schedule integrity by deferring and backlogging scope. This is done consciously by having the customer prioritize the features that get delivered each iteration or sprint. Of course such tradeoffs assume that the architecture is stable and will not be broken as features are added and deleted sprint-by-sprint. It also assumes that the technical debt accrued can be sustained once the product is rolled out for sale or use.

The usefulness of the size estimates was judged based on the breadth of its use. For example, stories/story points are used primarily by agile teams to measure and compute metrics like velocity and burn-down charts. Analogies and proxies are used at the project level as well. In contrast, function points are used enterprise-wide because they could be compared against industry past performance data and industry benchmarks. Finally, it just is too early to make a call for Halstead vocabulary because it is new and has not yet been widely used.

From the data we collected we can conclude that estimating size maintains its importance for agile projects because of its relationship to resource estimation, measurement and risk assessment, i.e., relating risks to their cost and schedule impacts. It also confirms that different methods seem more appropriate for sizing jobs at different organizational levels especially when used by agile teams as relative instead of comparative, empirical and/or mathematical measures.

We can also confirm that there are many currently available resources, including sizing methods, models and relevant historical databases ^{29, 30, 31}, that can be used to effectively generate reliable software sizing estimates especially when they are adapted and tailored for use with agile methods. These size estimates can be used in turn as the basis of the metrics and measures that many practitioners use to estimate, manage and report the resources (time, people, $, etc.) needed to develop and deliver software quality products for internal use or sale to third-parties.

The Future

There seems to be no end to the controversy over which sizing method is best and whether or not estimates are needed for software products being developed using agile methods. Agile purists argue against estimates, while others argue for them. While the debate rages, agile projects continue to get bigger and more complicated. This trend is to be expected as more and more firms embrace agile methods as their preferred approach to develop and maintain software³² including that generated on agile-at-scale projects.

Because enterprises are also embracing agile methods in more areas than just software, new sizing measures are being introduced for other applications. As an example, many system engineering organizations are transitioning to the use of agile methods. Like for software groups, their goal is to use measurement to improve both their estimates and their control over quality, timeliness, efficiency and effectiveness of the processes they use and the products they generate. Their main measure of size is the number of systems requirements. If a standard measure is to be used to size jobs across all engineering disciplines, the open question is if and how do such requirements measures relate to those used for software (stories/story points, function points, etc.) and vice versa. So far, there are no answers. However, professional groups like the International Council on Systems Engineering (INCOSE) are working the issues and developing guidelines ³³.

The main question that needs to be answered in the near-term is whether existing or new sizing methods will take hold for use with agile methods. Our assessment is that the five methods outlined in this article will continue to be employed especially as agile methods become used in wider contexts. Function points will dominate in enterprise-wide and larger projects. Stories/story points will continue to be used heavily in smaller projects at the team level because of their relationships with Scrum. Analogies and proxies will be used as well especially in situations where past experience can be leveraged. Halstead sizing approaches may or may not take hold. It seems too early to tell as the approach is new and untried. Combinations of methods will be exploited as agile methods are used more widely across disciplines and at the enterprise level. In other words, we see no surprises on the horizon in either the near- or long-term.

References

The following references were cited in this Appendix:

M. C. Layton, Agile Manifesto for Dummies, For Dummies, 2012
J. Keyes, Extreme Programming Concepts, Amazon Digital Services, 2015.
See
See
K. S. Rubin, Essential Scrum: A Practical Guide to the Most Popular Agile Process, Addison-Wesley, 2012.
See
A. Stellman and J. Greene, Learning Agile: Understanding Scrum, XP (Extreme Programming), Lean and Kanban, O’Reilly Media, 2014.
A. Shalloway and G. Beaver, Lean-Agile Software Development: Achieving Enterprise Agility, Addison-Wesley, 2009.
R. K. Wysocki, Effective Project Management: Traditional, Agile, Extreme, 7th Edition, John Wiley, 2013.
D. J. Reifer, Agile Scaling Findings and Productivity Statistics for Larger Enterprises, Reifer Consultants LLC, Jul 2016.
D. J. Reifer, Ten Major Findings - Quantitative Analysis of Agile Methods Study (2015), Reifer Consultants LLC, Aug 2015.
W. Winston, Microsoft Excel Data Analysis and Business Modeling, Microsoft Press, 2016.
A. Di Ciaccio and M. Coli, Advanced Statistical Methods for Analysis of Large Data-Sets, Springer, 2012.
C. Lloyd, Data-Driven Business Decisions, John Wiley, 2011.
D. Garmus and D. Herron, Function Points Analysis: Measurement Practices for Successful Software Projects, Addison-Wesley, 2000.
IFPUG, Counting Practices Manual 4.3.1.
For counting function points as defined by COSMIC, see.
For counting function points, as defined by IFPUG, when using agile methods, see.
To acquire benchmarks, see for a description of the International Software Benchmarking Standards Group (ISBSG) benchmarking services.
To acquire benchmarks, see for a description of the Reifer Consultants LLC benchmarking services.
M. Halstead, Elements of Software Science, Elsevier North Holland, 1977.
M. Cohn, User Stories Applied: For Agile Development, Addison-Wesley, 2004.
For Scrum sizing guidelines, see
For AUP sizing guidelines, see
For LESS sizing guidelines, see
For SAFe sizing guidelines, see:
ISO/IEC 20926:2009, Software and Systems Engineering – Software Measurement – IFPUG Functional Size Measurement Method, 2009.
The International Function Point Users Group (IFPUG) is a non-profit promoting the use of function points including the use of SNAP (Software Non-Functional Analysis Process).
See for SEER software cost estimating model information.
See for PRICE TruePlanning cost estimating framework information.
See for SLIM cost estimating tools information.
D. J. Reifer, Agile Introduction: Are You a Laggard,” Reifer Consultants, Jul 2015.
INCOSE, Systems Engineering Measurement Primer, Version 2, International Council on Systems Engineering (INCOSE), Nov 2010.

About the Author

Mr. Reifer as President of Reifer Consultants LLC, a software management consulting firm, is frequently called upon to help clients grow their business, startup large projects, solve operational problems, and introduce new technologies like agile methods, product lines and cloud computing. He is in demand because he focuses on using metrics-based management approaches. Previously, while with TRW, Mr. Reifer served as Deputy Program Manager for their Global Positioning Satellite (GPS) efforts. While with the Aerospace Corporation, Mr. Reifer managed all of the software efforts related to the Space Shuttle. Currently, as President of RCI, Mr. Reifer works as an executive coach advising executives how to improve their software organizations using new technology. His current focus is on putting agile methods to work effectively.

Related Editorial
Popular across InfoQ

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Size Estimation Approaches for Use with Agile Methods

InfoQ Article Contest

Key Takeaways

What Does It Mean to be Agile?

Survey Methodology

Sizing Methods

Conclusions

The Future

References

About the Author

Rate this Article

This content is in the Culture & Methods topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter