Criteria for test completion
In the test plan the test manager describes the items to be tested, test cases, tools needed, scheduled activities, and assigned responsibilities. As the testing effort progresses many factors impact on planned testing schedules and tasks in both positive and negative ways. For example, although a certain number of test cases were specified, additional tests may be required. This may be due to changes in requirements, failure to achieve coverage goals, and unexpected high numbers of defects in critical modules. Other unplanned events that impact on test schedules are, for example, laboratories that were supposed to be available are not (perhaps because of equipment failures) or testers who were assigned responsibilities are absent (perhaps because of illness or assignments to other higherpriority projects). Given these events and uncertainties, test progress does not often follow plan. Tester managers and staff should do their best to take actions to get the testing effort on track. In any event, whether progress is smooth or bumpy, at some point every project and test manager has to make the decision on when to stop testing. Since it is not possible to determine with certainty that all defects have been identified, the decision to stop testing always carries risks. If we stop testing now, we do save resources and are able to deliver the software to our clients. However, there may be remaining defects that will cause catastrophic failures, so if we stop now we will not find them. As a consequence, clients may be unhappy with our software and may not want to do business with us in the future. Even worse there is the risk that they may take legal action against us for damages. On the other hand, if we continue to test, perhaps there are no defects that cause failures of a high severity level. Therefore, we are wasting resources and risking our position in the market place. Part of the task of monitoring and controlling the testing effort is making this decision about when testing is complete under conditions of uncertainly and risk. Managers should not have to use guesswork to make this critical decision. The test plan should have a set of quantifiable stop-test criteria to support decision making. The weakest stop test decision criterion is to stop testing when the project runs out of time and resources. TMM level 1 organizations often operate this way and risk client dissatisfaction for many projects. TMM level 2 organizations plan for testing and include stop-test criteria in the test plan. They have very basic measurements in place to support management when they need to make this decision. Shown in Figure 9.6 and described below are five stop-test criteria that are based on a more quantitative approach. No one criteria is recommended. In fact, managers should use a combination of criteria and cross-checking for better results. The stop-test criteria are as follows.
1 . A l l the Planned Tests That Were Developed Have Been Executed and Passed.
This may be the weakest criterion. It does not take into account the actual dynamics of the testing effort, for example, the types of defects found and their level of severity. Clues from analysis of the test cases and defects found may indicate that there are more defects in the code that the planned test cases have not uncovered. These may be ignored by the testers if this stop-test criteria is used in isolation.
2 . A l l Specified Coverage Goals Have Been Met.
An organization can stop testing when it meets its coverage goals as specified in the test plan. For example, using white box coverage goals we can say that we have completed unit test when we have reached 100% branch coverage for all units. Using another coverage category, we can say we have completed system testing when all the requirements have been covered by our tests. The graphs prepared for the weekly status meetings can be applied here to show progress and to extrapolate to a completion date. The graphs will show the growth of degree of coverage over the time.
3 . The Detection of a Specific Number of Defects Has Been Accomplished.
This approach requires defect data from past releases or similar projects. The defect distribution and total defects is known for these projects, and is applied to make estimates of the number and types of defects for the current project. Using this type of data is very risky, since it assumes the current software will be built, tested, and behave like the past projects. This is not always true. Many projects and their development environments are not as similar as believed, and making this assumption could be disastrous. Therefore, using this stop-criterion on its own carries high risks.
4 . The Rates of Defect Detection for a Certain Time Period Have Fallen Below a Specified Level.
The manager can use graphs that plot the number of defects detected per unit time. A graph such as Figure 9.5, augmented with the severity level of the defects found, is useful. When the rate of detection of defects of a severity rating under some specified threshold value falls below that rate threshold, testing can be stopped. For example, a stop-test criterion could be stated as: ―We stop testing when we find 5 defects or less, with impact equal to, or below severity level 3, per week.‖ Selecting a defect detection rate threshold can be based on data from past projects.
5 . Fault Seeding Ratios Are Favorable.
Fault (defect) seeding is an interesting technique first proposed by Mills . The technique is based on intentionally inserting a known set of defects into a program. This provides support for a stop-test decision. It is assumed that the inserted set of defects are typical defects; that is, they are of the same type, occur at the same frequency, and have the same impact as the actual defects in the code. One way of selecting such a set of defects is to use historical defect data from past releases or similar projects.
The technique works as follow. Several members of the test team insert (or seed) the code under test with a known set of defects. The other members of the team test the code to try to reveal as many of the defects as possible. The number of undetected seeded defects gives an indication of the number of total defects remaining in the code (seeded plus actual). A ratio can be set up as follows:
Detected seeded defects = Detected actual defects Total seeded defects Total actual defects
Using this ratio we can say, for example, if the code was seeded with 100 defects and 50 have been found by the test team, it is likely that 50% of the actual defects still remain and the testing effort should continue.When all the seeded defects are found the manager has some confidence that the test efforts have been completed.