Earlier this year, InfoQ published the article Evaluating Agile and Scrum with Other Software Methodologies by Capers Jones, in which he compares the effectiveness of Agile and Scrum with other software development methods using several standard metrics.
InfoQ interviewed Capers about measuring agile performance and the usage of measurements in general, and for support of agile adoption.
InfoQ: Thanks Capers for doing this interview for InfoQ. For those readers that do not know you, can you shortly introduce yourself?
Capers: I am Vice President and CTO of Namcook Analytics LLC. For more then 40 years I am collecting quality and productivity data, and I have published more than a dozen books and 100 journal articles including an article on software measurement in Scientific American Magazine.
InfoQ: We have looked at the data that you have shared with us, and it’s extensive. Can you give some examples how companies are using the data?
Capers: Companies use both the data and our predictive tool, Software Risk Master (SRM). We do studies before projects start that show risks, schedules, staffing, effort, costs, and quality. These studies are supported by historical data from similar projects. The idea is to minimize risks of delays, cost overruns, and outright failure and litigation.
For example we can show clients side-by-side examples of waterfall, Agile, XP, and many other development methods. Other companies do the same kinds of things. For example many commercial estimation tools such as KnowledgePlan, SEER, SLIM, and SRM support Agile projects.
InfoQ: What are the main barriers that you see in the adoption of measurement practices? And what can we do to address them, and help organizations when they want to use measurements with agile?
Capers: Agile is a fairly successful method to speed up development by recognizing that changes are normal and should be handled easily by development methods. This means that the final size of the application may be ambiguous. However Agile has been used long enough to have some historical results that can show useful information such as number of sprints, rate of growth over time, and accumulated effort for development and even for maintenance after release. Quality results are also available for some projects.
There is a strong interest among top executives at the CEO, CIO, and CTO levels to know how projects will turn out before spending large amounts of money on them. Measures of results can improve future predictions. If the Agile community wants to convince CEO’s and other C-level execs that Agile works, then measuring both quality and productivity of Agile would help.
One specific problem is that manual counting of function points is somewhat slow which makes it is a barrier for Agile projects, and it also assumes a fixed size. Modern high-speed methods that can predict size in a few minutes plus rates of change seem more suitable for the Agile environments. Several companies are working on automated or high-speed function points, including CAST Software, Relativity Technologies, and my own company
InfoQ: When organizations are migration from waterfall projects to an agile approach, would that impact the measurements that they are using to manage their software development?
Capers: This question seems to assume that Agile is the only valid destination from waterfall. There are multiple software methods that are just as effective as Agile, and some are better. IBM’s Rational Unified Process (RUP) and Watts Humphrey’s Teams Software Process (TSP) often have better quality than Agile.
Several Agile variants such as Extreme Programming (XP) and Crystal Development also look good. A few methods such as IntegraNova produce software much quicker than Agile does. IntegraNova is a new method from Spain being used by the German military and demonstrated to the U.S. department of defense. It uses a combination of requirements models and application generation to create full applications in a couple of days. Custom design and hand coding, even with Agile, is intrinsically slower than application generation or using certified reusable components.
By limiting the question to only waterfall and Agile the question loses meaning. It is like saying “when you decide to switch from candy bars to potato chips as your main meal what will that do to your health.” You need more than one choice. You also need to include “hybrid” which often is a very solid choice.
InfoQ: Point taken. My question then is: Are there some basic measurements that you can use when migrating from one methodology to another, metrics that can help you to track your performance?
Capers: For measuring productivity the two most widely used measures in the world are “function points per staff month” and its reciprocal, “work hours per function point.” For the purposes of judging productivity, rates above 10 function points per staff month are good; rates below 7 function points per staff month are not so good. Some Agile projects have topped 15 function points per staff month, and this almost never happens for waterfall.
For measuring quality the most effective measures are “defect potentials “normalized using function points and “defect removal efficiency.” Defect potentials are the sum of bugs that will be found in requirements, design, code, documents and bad fixes. A defect potential above 5 per function point is bad; a defect below 3 per function point is good.
Removing more than 99% of defects is the goal, but the U.S. average is only about 85%. Getting rid of more than 95% is good; below 85% is bad. Most Agile projects don’t measure either defect potentials or defect removal efficiency, so they don’t really know much about quality.
InfoQ: About the team software process (TSP) methodology, there were comments from InfoQ readers on your article on comparing software methods that it was a “sales pitch” for the TSP?
Capers: TSP was developed by the late Watts Humphrey. It is endorsed by the Software Engineering Institute (SEI). I had nothing to do with the development of TSP and I have no business relationship with the SEI.
So far as I know the only revenues that derive from using TSP would be the money spent for buying Watts Humphrey’s book on TSP, and royalties paid to Watts’ estate. I make no money at all from TSP nor from any other methodology. I don’t sell methodologies, I only measure their results.
I have derived income from measuring TSP projects but no more than from measuring any of 34 other methodologies. My paper reported the results of 10 methodologies. I was not paid anything by any methodology vendor.
InfoQ: Agile teams can use story points to track their performance. Can they be used in a similar way as function points, given thay are often relative to a single team/product/project?
Capers: Story points are popular among Agile projects and have some intellectual value because they lead to thinking about what users are going to do with software. In that sense they are valuable.
Story points have no international standards nor any certification exams as do function points. As a result, what is called a "story point" can vary by more than 3 to 1 from project to project and company to company.
In the few projects that have used both story points and IFPUG function points there seems to be about a 2 to 1 ratio; i.e. there are about twice as many function points for the same application as there are story points.
If you want to compare a project against another project you may be out of luck. There are benchmark groups with function point data for more than 50,000 projects, including about 5,000 that are available to the public from the International Software Benchmark Standards Group (ISBSG.org). So far as I know there are no good benchmarks for story points.
InfoQ: Are there specific Agile practices that have proved to be beneficial for teams? Which ones are they, and how do they deliver value for the organization?
Capers: In general scrum stand-up meetings seem to be both popular and beneficial. Of course scrum practices can be used with non-Agile methods, although they did originate from Agile. The fact that most people think of scrum as exclusive to Agile is wrong. Once things like stand-up scrum sessions show value they spread to other methods too.
One of the more successful Agile practices has been “embedded users” who become part of the team and both provide requirements in real time and also validate that requirements have been met. The caveat with embedded users is that it is effective only for fairly small projects with few users. For example Microsoft Office has more than 50,000,000 users worldwide and no single user can possibly understand all the features of such a large and complex application suite.
There is a similar caveat with application size. Up to about 1000 function points a single user can understand system requirements. For applications that are in the 10,000 to 100,000 function point size range no single person knows more than about 5% of the overall requirements. For these big systems focus groups and market studies are needed in addition to embedded users.
InfoQ: What does it need for an organization that is using agile methods to effectively deploy measurements?
Capers: This is not just an Agile issue. From on-site interviews with hundreds of software development teams the U.S. average for collecting historical project data is only about 37%. The most common omissions are unpaid overtime, project management, and the work of part-time specialists such as tech writers, quality assurance, integration, and configuration control personnel. CEO’s have a lower opinion of software groups than of other technical groups due to consistently optimistic estimates, schedule delays, cost overruns, poor quality when delivered, and outright failures. Software is much worse in all of these. Agile was intended to improve the situation and has done so up to a point, but not for large systems bigger than 10,000 function point which remain troublesome even with Agile.
Better measures of projects including Agile projects will improve the professional status of the software community and perhaps lead to CEO’s having more respect for software groups than they have today. Many CEO’s regard software groups as a painful necessity. A former chairman of ITT, Lyman Hamilton, gave a public speech in which he said software engineers need three years of on-the-job training before they work as well as other kinds of engineers such as electrical or mechanical.
InfoQ: What about outsourcing companies, do they measure their performance?
Capers: They do, so unless Agile starts producing quantitative data the Agile teams are going to put themselves at risk of being outsourced. As an example 60% of Indian companies use function point metrics and use data to demonstrate success. The government of Brazil now requires function points for all government software contracts and the government of South Korea may do the same, as may the government of Italy. Software projects need to be under control and function points are a key success factor. If a U.S. agile group wants to do business overseas they will need better measures than today. Otherwise they can’t even submit bids in countries such as Brazil and South Korea
U.S. companies could learn a lot from the way contracts are handled in Brazil. Size is known in function points and effort, schedules, costs, etc. are predicted early with pretty good accuracy.
InfoQ: If an organization only wants to do 1 measurement, which one would you recommend? Could you explain why?
Capers: Organizations that only want to do one thing will fail no matter which one thing they choose. It takes more than 1.
The best measures to convince CEO's and others that agile is real would be:
- Schedules compared to other methods such as RUP, TSP, and waterfall
- Effort and costs compared to other methods
- Productivity measured using function points per staff month, or the reciprocal of work hours per function point
You need multiple measurements, no matter what the one thing is. It is like trying to choose only one product as your only food; no matter whether it is steak or Brussels sprouts if you only eat one thing you will eventually get sick and probably die of malnutrition.
InfoQ: True, but organizations can’t do everything at the same time. Any last advice how to get started with measurements, or what they can do to improve the capability to use measurements?
Capers: In essence measurement is fairly simple. You need to know the size of the application in a standard metric such as function points. You need to know the development effort, and you need to know how many bugs or defects were found before release and for a fixed interval after release.
This data is required for many contracts and is increasingly being required by CEO’s and CIO’s of large companies. Software development needs to switch from being a low-grade craft to a true profession, and measurements are a step in that direction. Anyone who has paid a bill from an attorney or a bill from a physician’s office can see that measures are a standard part of professions. If physicians and attorneys can measure, so can the software community.
When an Indian outsource company visits a U.S. company and gives the CEO a proposal to take over software development, in all probability the proposal will include expected function point productivity and quality data. How can U.S. software teams compete with outsource groups if they don’t know either their productivity or quality?
InfoQ: Are you interested to hear about the experiences from the readers of InfoQ with measurements?
Capers: If any readers of this interview have measures it would be interesting to discuss them and find out how they were used.
Available for download: "Corporate software risk reduction in a fortune 500 company"
This paper from Capers Jones describes how a fortune 500 manufacturing company decided to embark on a corporate-wide software risk reduction program. The risk reduction activities took place over a four-year period and can serve as a model for other large corporations that are dissatisfied with canceled software projects, schedule delays, cost overruns, litigation from disgruntled clients, and other endemic software problems.
About the Interviewee
Capers Jones is currently vice president and chief technology officer of Namcook Analytics LLC. This company designs leading-edge risk, cost, and quality estimation and measurement tools. He was the president of Capers Jones & Associates LLC until 2012, a software researcher and manager at IBM for 12 years and Assistant Director of Programming at the ITT Corporation where he started their software measurement program. Capers is a well-known author and international public speaker. Some of his books are The Economics of Software Quality (with Olivier Bonsignour), Software Engineering Best Practices, Applied Software Measurement and Estimating Software Costs. He is currently working on his 15th book entitled The Technical and Social History of Software Engineering, to be published in the autumn of 2013.
Capers and his colleagues have collected historical data that is used for judging the effectiveness of software process improvement methods and also for calibrating software estimation accuracy, in cited in software litigation in cases where quality, productivity, and schedules are part of the proceedings. He has also worked as an expert witness in 15 lawsuits involving breach of contract and software taxation issues.