Understanding Quality and Reliability
One of the most overlooked but important areas of software estimation, measurement, and assessment, is quality. It often is not considered or even discussed during the early planning stages of all development projects, but it’s almost always the ultimate criteria for when a product is ready to ship or deploy. Therefore, it needs to be part of the expectation-setting conversation from the outset of the project.
So, how can we talk about product quality? It can be measured a number of ways, but two in particular give excellent insights into the stability of the product. They are:
- The number of defects and errors discovered in the system between testing and actual delivery , and
- The Mean Time to Defect (MTTD), or the amount of time between errors discovered prior to and after delivery to the customer.
The reason we like these two measures is that they both relate to product stability, a critical issue as the release date approaches. They are objective, measurable, and can usually be derived from most organizations’ current quality monitoring systems without too much trouble.
Generally, having fewer errors and a higher MTTD is associated with better overall quality. While having the highest quality possible may not always be a primary concern for stakeholders, the reliability of the project must meet some minimum standards before it can be shipped to the customer. For example, experience has shown that, at delivery, most projects are about 95 percent defect free after running for about a day without crashing.
Another good rule of thumb is that the software typically will be of minimum acceptable reliability when testers are finding fewer than 20 errors per month. This applies to both large and small applications. In other words, the product will run about an eight-hour workday. Of course this rule of thumb is mostly applicable for commercial IT applications. Industrial and military embedded applications require a higher degree of reliability.
The Rayleigh defect model
One approach to attaining optimal quality assurance is to use the Rayleigh function to forecast the discovery rate of defects as a function of time throughout a traditional software development process. The Rayleigh function was discovered by the English physicist Lord Rayleigh in his work related to scattering of acoustic and electro‐magnetic waves. A Rayleigh reliability model closely approximates the actual profile of defect data collected from software development efforts.
The Rayleigh equation can be used to predict the number of defects discovered over time. It can be formulated to cover the duration of time from the High Level Design Review (HLDR ‐ High Level Design is Complete) until 99.9 percent of all the defects have been discovered. A sample Rayleigh defect estimate is shown in Figure 1.
Figure 1. Sample Rayleigh defect estimate.
Note that the peak of the curve occurs early in the build and test phase. This means that a large number of the total defects are created and discovered early in the project. These defects are mostly requirements, design, and unit coding defects. If they are not found they will surface later in the project, resulting in the need for extensive rework.
Milestone 10 is declared to be the point in time when 99.9 percent of the defects have been discovered. Less than 5 percent of the organizations that QSM has worked with, record defect data during the detailed design phase. Industry researchers claim that it can cost 10-100 times more to fix a defect found during system testing rather than during design or coding (Boehm, 1987; McConnell, 2001), so one could make a compelling case to start measuring and taking action earlier in the process.
Simple extensions of the model provide other useful information. For example, defect priority classes can be specified as percentages of the total curve. This allows the model to predict defects by severity categories over time, as illustrated in Figure 2.
Figure 2. Rayleigh defect model broken out by defect severity class.
A defect estimate could be thought of as a plan. For a particular set of conditions (size, complexity, efficiency, staffing, etc.), a planned curve could be generated. A manager could use this as a rough gauge of performance to see if their project is performing consistently with the plan and, by association, with comparable historic projects. If there are significant deviations, this would probably cause the manager to investigate and, if justified, take remedial action.
Figure 3 shows how one can use the defect discovery estimate to track and compare actuals. Obviously, the actual measurements are a little noisier than the estimate, but they track the general pattern nicely. They also give confidence that the error discovery rate will be below 20 per month, our recommended minimum acceptable delivery criteria, at the scheduled end of the project.
Figure 3. Defect discovery rate plan with actuals plotted.
Defect model drivers
Since 1978, QSM has collected data on over 10,000 completed software projects. Analyses using this data have found that there are specific inputs that determine the duration and magnitude of the Rayleigh defect model. The inputs enable the model to provide an accurate forecast for a given situation. There are three macro parameters that the QSM model uses:
- Size (new and modified)
- Productivity Index
- Peak Staffing
These driving factors impact the defect behavior patterns that we see with regards to software projects. The next sections will examine these driving factors in more detail. Unless otherwise noted, these findings are based on analyses conducted using data from the QSM database.
Historically, we have seen that as project size increases, so do the number of defects present in the system (see Figure 4). Stated simply, building a larger project provides more opportunities for developers to create system defects, and also requires more testing to be completed. This rate of defect increase is close to linear.
Similarly, as size increases the MTTD decreases. This is due to the increased number of errors in the system, which often happens due to the communication complexities that are inevitably introduced through larger teams. With more errors in the system, the amount of time between defects decreases. As such, larger projects tend to have lower reliabilities because there is less time between defects. These trends are typical for the industry.
Figure 4. As size increases so do the number of defects.
Productivity Index (PI)
Productivity also tends to have a great impact on the overall quality of a system and can be measured via the PI. PI is a calculated macro measure of the total development environment. It embraces many factors in software development, such as management influence, development methods, tools, techniques, and the skill and experience of the development team. It also takes into account application type and complexity, process factors, and reuse. It uses a normalized scale ranging from 0.1 to 40, where low values are associated with poor environments and tools and complex systems and high values indicate good environments, tools and management, and well-understood projects
Historic data has shown that the number of defects discovered exponentially decreases as the PI increases. Figure 5 shows the cumulative number of defects discovered for the same sized software application using two different PI’s (17 and 21 respectively). The project with the higher PI not only delivers the application nine months faster, but also includes about 360 fewer errors in total. It makes sense that when development teams improve, they tend to make fewer errors to begin with, thus decreasing the number of errors found during testing. Operating at a higher productivity level can drastically increase software quality.
Figure 5. Comparison of defects produced for the same size system using different PI’s.
The size of the development team also can impact the quality of a system as larger teams tend to produce more errors than smaller teams. As shown in Figure 6, when comparing a large team (red) with a small team (grey) at the same project sizes, the small teams produce between 50-65 fewer errors than large teams at all project sizes. Additionally, they paid little, if any, schedule penalty and used significantly less effort. This finding can be especially useful when looking to identify areas of waste within an organization, because it shows that adding resources to a project does not always improve its quality.
Figure 6. Small teams produce fewer errors than large teams at all project sizes.
Software reliability and quality are two areas that should be addressed in the expectation setting and project negotiation stages of a project. Quality reflects how well software complies with or conforms to a given design, based on functional requirements or specifications, while reliability pertains to the probability of failure-free software operation for a specified period of time in a particular environment.
As a leader, there are several strategies that you can use to improve reliability and quality. Keep the developed product size as small as possible, use smaller teams of people, and make regular investments in your environment to improve the efficiency and effectiveness of your development shop. All of these actions will pay reliability and quality dividends
- B. Boehm, IEEE Software, Sept. 1987, pp. 84-85
- S. McConnell, An Ounce of Prevention, IEEE Software, May/June 2001
About the Authors
C. Taylor Putnam-Majarian is a Consulting Analyst at QSM and has over seven years of specialized data analysis, testing, and research experience. In addition to providing consulting support in software estimation and benchmarking engagements to clients from both the commercial and government sectors, Taylor has authored numerous publications about Agile development, software estimation, and process improvement, and is a regular blog contributor for QSM. Most recently, Taylor presented research titled Does Agile Scale? A Quantitative Look at Agile Projects at the 2014 Agile in Government conference in Washington, DC. Taylor holds a bachelor’s degree from Dickinson College.
Doug Putnam is Co-CEO for Quantitative Software Management (QSM) Inc. He has 35 years of experience in the software measurement field and is considered a pioneer in the development of this industry. Mr. Putnam has been instrumental in directing the development of the industry leading SLIM Suite of software estimation and measurement tools, and is a sought after international author, speaker and consultant. His responsibilities include managing and delivery of QSM software measurement services, defining requirements for the SLIM Product Suite and overseeing the research activities derived from the QSM benchmark database.