Software Testing Stuff: TestTechniques

DETAILED TOOL DESCRIPTIONS AND CHARACTERISTICS

Each tool description is presented in the following standard format:

· Information input--The input information required to use this technique.

· Information output--A description of the results of the technique.

· Outline of method--A basic outline of the methodology. In some cases, it is beyond the scope of this appendix to outline the methodology fully; a reference is provided in such cases.

· Example--An example of the technique if it is possible. Again, an example of some techniques would be lengthy and thus beyond the scope of this appendix. The mentions of specific tools in the text are for examples only; those tools are not being promoted for use.

· Effectiveness—Profile information on the types of errors detected and the degree to which the technique detects those errors.

· Applicability-Information about use during certain software development test phases and for specific types (e.g., scientific, complex control flow, large versus small applications) of software applications.

· Maturity-Information on the current status (e.g., still in development and experimental stage or has been tested and widely used) of a technique with respect to use.

· User training - An estimate of the learning time and training needed to successfully implement the technique.

· Costs - A description of the necessary resources (e.g., staff time, computer time, or any type of overhead) required to use the technique.

STATIC ANALYSIS TECHNIQUES

CODE REVIEWS AND WALKTHROUGHS

This section contains descriptions of peer reviews, formal reviews, and associated walkthroughs. These test techniques involve the reading or visual inspection of a program by a team whose objective is to find errors, not solutions to the errors.

Peer Review

In a peer review, project personnel perform a detailed study and evaluation of code, documentation, or specification. Code reading, round-robin reviews, wall<throughs, and inspections are examples of peer reviews that differ in formality, participant roles and responsibilities, input required, and output produced.

Information Input. The input to a particular peer review varies slightly depending on the form of review. Peer reviews generally require that a review package be distributed. This package contains a summary of the requirements that are the basis for the product being reviewed. Other input depends on the current stage of the software life cycle. For example, input to a peer review during the coding phase consists of program listings, design specifications, programming standards, and a summary of results from the design peer review previously held on the same product. Common input to particular forms of peer review includes:

· Code-reading review:

§ Component requirements.

§ Design specifications.

§ Program listings.

§ Programming standards.

· Round-robin review:

§ Component requirements.

§ Design or code specifications.

§ Program listings (if appropriate).

· Walkthrough:

§ Component requirements.

§ Design or code specifications.

§ Program listings (if appropriate).

§ Product standards.

§ Supporting documentation (e.g., flowcharts, HIPO charts, and data dictionaries).

§ Question list (derived by participants before review).

· Inspection:

§ Component requirements.

§ Design or code specifications. -Program listings (if appropriate).

§ Product standards.

§ Supporting documentation.

§ Checklists (containing descriptions of particular features to be evaluated).

Information Output. The output from a peer review varies. One output common to all peer reviews is a decision or consensus about the product under review. This is usually in the form of a group approval of the product as is, an approval with recommended modifications, or a rejection (and rescheduled review date). Specific output by form of peer review includes:

· Code-reading and round-robin reviews:

§ Informal documentation of detected problems.

§ Recommendation to accept or reject reviewed product.

§ Dependency list (containing the relationship of coding constructs with variables).

· Walkthrough:

§ Action list formally documenting problems.

§ Walkthrough form containing review summary and group decision.

· Inspection:

§ Inspection schedule and memo defining individual roles and responsibilities, inspection agenda and schedule.

§ Problem definition sheet.

§ Summary report documenting error correction status and related statistics on the errors.

§ Management report describing errors, problems, component status to management.

Outline of Method. The peer review methodology and participant responsibilities vary. Summaries of these methodologies are provided later in this section. A few features, however, are common to all methodologies.

Most peer reviews are not attended by management. (An exception can be made when the project manager is also a designer, coder, or tester, usually on small projects.) Management's presence is contrary to the intent of peer reviews because it inhibits participants who might feel that they--not the product--are being evaluated.

Project review materials are assembled and distributed before a peer review is conducted, to give participants time to review the data.

At the end of most peer reviews, the group arrives at a decision about the status of the review product. This decision is usually communicated to management.

Most reviews are conducted in a group as opposed to individually by participants or by the project team itself. Most organizations involved in software development and maintenance use one of the following approaches:

· Conventional team - A senior programmer directs the efforts of one or more less experienced programmers.

· Egoless team - Programmers of approximately equal experience share product responsibilities.

· Chief programmer team - A highly qualified senior programmer leads the efforts of other team members for whom specific roles and responsibilities (e.g., backup programmer, secretary, librarian) have been assigned.

The group participating in the peer review is not necessarily the same as the team that manages and completes the software product. The review group is likely to be composed of a subset of the project team along with other individuals as required by the form of review being held and the current stage of the life cycle in process. The benefits of peer reviews are unlikely to be attained if the group acts separately, without some designated responsibilities. The following are some roles commonly used in review groups (though not all are employed in a single review):

· Group and review leader - The individual designated by management with planning, directing, organizing, and coordinating responsibilities. This individual usually has responsibilities after the review to ensure that recommendations are implemented.

· Designer - The individual responsible for the translation of the product into a plan for its implementation.

· Implementor - The individual responsible for developing the product according to the plan detailed by the designer.

· Tester - The individual responsible for testing the product as developed by the implementor.

· Coordinator - The individual designated with planning, directing, organizing, and coordinating responsibilities.

· Producer - The individual whose product is under review.

· Recorder - The individual responsible for documenting the review activities during the review.

· User representative - The individual responsible for ensuring that the user's requirements are addressed.

· Standards representative - The individual responsible for ensuring that product standards are conformed to.

· Maintenance representative - The individual responsible for updates or corrections to the installed product.

· Others - Individuals with specialized skills or responsibilities needed during the peer review.

Although the forms of peer reviews have some similarities and generally involve designation of participant roles and responsibilities, they are different in application. The remainder of this section summarizes the application methods associated with the forms of peer reviews previously introduced.

Code-Reading Review. Code reading is a line-by-line study and evaluation of program source code. It is generally performed on source code that has been compiled and is free of syntax errors. Some organizations, however, practice code reading on uncompiled source listings or handwritten coding sheets to remove syntax and logic errors prior to code entry. Code reading is commonly practiced on top-down, structured code; it is not cost-effective when performed on unstructured code.

No more than three or four members should belong to the code-reading review team. The producer sets up the review and is responsible for team leadership. The producer, based on their experience, responsibilities with interfacing programs, or other specialized skill, selects two or three programmers or analysts.

The producer distributes the review input (see input section) about two days in advance. During the review, the producer and reviewers evaluate each line of code, checking for features that make the program more readable, usable, reliable, and maintainable. Two types of code reading may be performed: reading for understanding and reading for verification. Reading for understanding is performed when the reader desires an overall appreciation of how the program module works, its structure, what functions it performs, and whether it follows established standards. For Figure 1, which depicts the structure of a program component, a reviewer reading for understanding would review the modules in the following order: 1.0, 2.0, 2.1, 2.2, 3.0, 3.1, 3.2, and 3.3.

Figure 1. Structure of a Program Component

In contrast to this top-to-bottom approach, reading for verification implies a bottom-up review of the code. The component depicted previously would be perused in the following order: 3.3, 3.2, 3.1, 3.0, 2.2, 2.1, 2.0, and 1.0. In this manner, it is possible to produce a dependency list detailing parameters, control switches, table pointers, and internal and external variables used by the component. The list can then be used to ensure hierarchical consistency, data availability, and variable initiation. Reviewers point out any problems or errors detected while reading for understanding or verification during the review.

The team then makes an informal decision about the acceptability of the code product and may recommend changes. The producer notes suggested modifications and is responsible for all changes to the source code. Suggested changes are evaluated by the producer and need not be implemented if the producer determines that they are invalid.

There is no mechanism to ensure that changes are implemented or that the review is followed up.

Round-Robin Review. In a round-robin review, each participant is given an equal opportunity to study and present the evaluation of the product being reviewed.

A round-robin review can be performed during any phase of the product life cycle and is also useful for documentation review. In addition, some variations of the round-robin review incorporate the best features from other peer review forms but continue to use the alternating review leader approach. For example, during a round-robin inspection, each item on the inspection checklist is made the Responsibility of alternating participants.

The usual number of personnel involved in this type of peer review is four to six. The meeting is scheduled by the producer, who also distributes high-level documentation as input. The producer is either the first review leader or assigns this responsibility to another participant. The temporary leader guides the other participants (e.g., implementors, designers, testers, users, and maintenance representatives) through the first unit of work. This unit may be a module, paragraph, line of code, inspection item, or other unit of manageable size. All participants (including the leader) have the opportunity to comment on the unit before the next leader begins the evaluation of the next unit. The leaders are responsible for noting major comments raised about their piece of work. At the end of the review, all the major comments are summarized and the group decides whether to approve the product. No formal mechanism for review follow-up is used.

Walkthroughs. This type of peer review is more formal than the code-reading or round robin reviews. Distinct roles and responsibilities are assigned before the review. Preview preparation is greater, and a more formal approach to problem documentation is stressed. Another key feature of this review is that the producer presents it. The most common walkthroughs are those held during design and coding; however, recently they have been applied to specifications documentation and test results.

The producer schedules the review and assembles and distributes input. In most cases, the producer selects the walkthrough participants (although sometimes this is done by management) and notifies them of their roles and responsibilities. The walkthrough is usually conducted with less than seven participants and lasts no more than two hours. If more time is needed, there should be a break or the product should be reduced in size. Roles usually included in a walkthrough are producer, coordinator, recorder, and representatives of user, maintenance, and standards organizations.

Although the review is opened by the coordinator, the producer is responsible for leading the group through the product. In the case of design and code wall<throughs, the producer simulates the operation of the component, allowing each participant to comment based on that individual's area of specialization. A list of problems is kept, and at the end of the review, each participant signs the list or other walkthrough form indicating whether the product is accepted as is, accepted with recommended changes, or rejected. Suggested changes are made at the discretion of the producer. There are no formal means of follow-up on the review comments. If the wall<through review is used for products throughout the life cycle, however, comments from past reviews can be discussed at the start of the next review.

Inspections. Inspections are the most formal, commonly used form of peer review. The key feature of an inspection is the use of checklists to facilitate error detection. These checklists are updated as statistics indicate that certain types of errors are occurring more or less frequently than in the past. The most common types of inspections are conducted on the product design and code, although inspections may be used during any life cycle phase. Inspections should be short because they are often intensive; therefore, the product component to be reviewed must be small. Specifications or designs that result in 50 to 100 lines of code are usually manageable. This translates into an inspection of 15 minutes to one hour, although complex components may require as much as two hours. In any event, inspections of more than two hours are generally less effective and should be avoided.

Two or three days before the inspection, the producer assembles the input to the inspection and gives it to the coordinator for distribution. Participants are expected to study and make comments on the materials before the review.

A participant other than the producer leads the review. Generally, the individual who has the greatest involvement in the next phase of the life cycle is designated as reader. For example, a designer, a design review by an implementor, and so forth, would likely lead a requirements inspection. The exception to this is the code inspection, which is led by the designer. The inspection is organized and coordinated by an individual designated as the group leader or coordinator.

The reader goes through the product component, using the checklist as a means to identify common types of errors as well as standards violations. A primary goal of an inspection is to identify items that can be modified to make the component more understandable, maintainable, or usable. Participants discuss any issues that they identified in preinspection study.

At the end of the inspection, an accept or reject decision is made by the group, and the coordinator summarizes all the errors and problems detected and gives this list to all participants. The individual whose work was under review (e.g., designer, implementor, tester) uses the list to make revisions to the component. When revisions are implemented, the coordinator and producer go through a minireview using the problem list as a checklist.

The coordinator then completes management and summary reports. The summary report is used to update checklists for subsequent inspections.

Example. This section contains an example describing a code-reading review. Three days before the estimated completion of coding, the producer of a program component begins preparation for a code-reading review. The component is composed of 90 lines of FORTRAN code and associated comments. The producer obtains copies of the source listing and requirements and design specifications for the component and distributes them to three peers, notifying them of the review date and place.

Each reviewer reads the code for general understanding, reviewing a major function and its supporting functions before reviewing the next major function. For example, one reviewer notes an exception to the programming standards. Another thinks that the data names are not meaningful. The third has found several comments that inaccurately represent the function they describe. Each reviewer makes a note of these points as well as any comments about the structure of the component. Next, the requirements are studied to ensure that each requirement is addressed by the component. It appears that the requirements have all been met.

The producer leads the code-reading review. After a brief description of the component and its interfaces, the producer leads the reviewers through the code. Rather than progressing through the component from top to bottom, the decision is made to perform code reading from the bottom up to verify the component’s correctness.

As the code is being perused, one reviewer is made responsible for keeping a dependency list. As each variable is defined, referenced, or modified, a notation is made on the list.

The verification code reading uncovers the use of a variable before its definition. The producer documents this error on an error list. In addition, each of the problems detected earlier during the code reading (as performed by each individual) is discussed and documented.

At the end of the review, the producer summarizes the error list to the group. Because no major problems were detected, the participants decide to accept the code with the agreed-to minor modifications. The producer then uses the problem list for reference when making modifications to the component.

Effectiveness. Peer reviews offer the following qualitative benefits:

· Higher status visibility.

· Decreased debugging time.

· Early detection of design and analysis errors that would be much more costly to correct in later development phases.

· Identification of design or code inefficiencies.

· Increased adherence to standards.

· Increased program readability.

· Increased user satisfaction.

· Communication of new ideas or technology.

· Increased maintainability.

Little data is available that identifies the quantitative benefits attributable to the use of a particular form of peer review. One source, however, estimates that the number of errors in production programs was reduced by a factor of 10 through the use of walkthroughs. Computational and logic errors are the types of errors detected by this technique. Another source estimates that a project employing inspections achieved 23% higher programmer productivity than with walkthroughs. No data was available indicating the amount of increased programmer productivity attributable to the inspection alone.

Applicability. Peer reviews are applicable to large or small projects during design verification, unit test, and module test phases and are not limited to project type or complexity.

Maturity. Peer reviews are widely used and have been demonstrated to be effective in detecting errors despite the informality of the methodology. This technique has been in use for some time.

User Training. None of the peer reviews discussed requires extensive training to implement. They do require familiarity with the concept and methodology involved. Peer reviews are most successful when the individual with responsibility for directing the review is knowledgeable about the process and its intended results.

Costs. The reviews require no special tools or equipment. The main cost involved is staff time and is negligible compared with the expected returns. Most references suggest that peer reviews should be no longer than two hours, preferably a half hour to one hour. Preparation time can amount to as little as half an hour and should not require longer than half a day per review.

Formal Reviews

Formal reviews constitute a series of reviews of a software system, usually conducted at major milestones in the software development life cycle. They are used to improve development visibility and product quality and provide the basic means of communication between the project team, management, and user representatives. Formal reviews are most often implemented for medium-size to large development projects, although small projects can employ a less rigorous form of the technique.

The most common types of formal reviews are held at the completion of the requirements, preliminary design, detailed (critical) design, coding, and installation phases. Although names of these reviews may vary by organization, some generally recognized names are requirements review, preliminary design review (PDR), critical design review (CDR), code construction review, and acceptance test review.

Information Input. The input to a particular formal review varies slightly depending on the stage of the life cycle just completed. In general, each formal review requires that a review package be assembled and then distributed at a review meeting. This package contains a summary of the requirements that are the basis for the product being reviewed. These and other common input for formal reviews fall into the following three categories:

· Project documents - These are produced by the development team to describe the system. The specific documents required depend on the life cycle phase just completed. For example, a review conducted at the conclusion of the requirements phase would need the functional specifications or system/subsystem specifications.

· Backup documentation - This type of input is documentation that, although usually not contractually required, is needed to support systems development or otherwise record project progress. Specific types of backup documentation vary by the phase for which the review is performed. Rough drafts of user and maintenance manuals are examples of backup documentation examined during a design review to plan for continuation of the project. Program listings are an example of backup documentation used during a code construction review.

· Other input - All other input is primarily used to clarify or expand on the project and backup documents. Such input may include overlays and slides prepared by project management for the formal review meeting, the minutes of the previous phase review meeting, or preliminary evaluations of the project documents under review.

Information Output. The information output associated with a formal review generally falls into the following categories:

· Management reports - These are written reports from the project manager to senior management describing the results of the review, problems revealed, proposed solutions, and any senior management assistance required.

· Outside reviewer reports - These are written reports to the project manager from review participants who have not worked on the project. These reports provide outside reviewers an opportunity to express their appraisal of the project status and the likelihood of meeting project objectives. The reports also allow them to make suggestions for correcting any deficiencies noted.

· Action items - This is a list of all required post-review action items to be completed before a review can be satisfactorily closed out. With each item is an indication of whether customer or contractor action is required for resolution. Review minutes - This is a written record of the review meeting proceedings as recorded by a designee of the review team leader. The minutes of the review are distributed to each review team member.

· Decision to authorize next phase - A decision must be reached at any formal review to authorize initiation of the next life cycle phase.

· Understanding of project status--At the conclusion of any formal review, there should be a common understanding of project status among the project personnel present at the review.

Outline of Method. The methodology of formal reviews is outlined as follows.

Participants. The participants in a formal review are usually selected from the following personnel:

· Project manager.

· Project technical leader.

· Other project team members (e.g., analysts, designers, programmers).

· Client.

· User representatives.

· Line management of project manager.

· Outside reviewers (e.g., quality assurance personnel, experienced people on other projects).

· Functional support personnel (e.g., finance or technology personnel). Subcontractor management.

· Others (e.g., configuration management representative or maintenance representative).

The Process. Formal reviews should be scheduled and organized by project management. Each review must be scheduled at a meaningful point during software development. The formal review effectively serves as the phase milestone for any particular phase and includes the following steps:

· Preparation - All documentation that serves as source material for the review must be prepared beforehand. These materials may be distributed to each review participant before the review meeting to allow reviewers sufficient time to review and make appraisals of the materials. The location and time of the review meeting must be established, participants identified, and an agenda planned.

· Overview presentation - At the review meeting, all applicable product and backup documentation is distributed and a high-level summary of the product to be reviewed is presented. Objectives of the review are also given.

· Detailed presentation - A detailed description of the project status and progress achieved during the review period is presented. Problems are identified and openly discussed by the review team.

· Summary - A summary of the results of the review is given. A decision about the status of the review is made. A list of new action items is constructed and responsibility for completion of each item is assigned.

· Follow-up - The completion of all action items is verified. All reports are completed and distributed.

Example. Two weeks before the estimated completion of the requirements document, the producer of a program receives notification from the client that a requirements review is desired. The client proceeds in selecting a chairperson to conduct the review. The chairperson, in turn, selects as participants in the review the project manager, project technical lead, a member of the requirements definition team, and a member of the requirements analysis team. The client also has indicated a desire to include the following personnel in the review: a representative from the user department, a reviewer from an independent computing organization, and a representative from the client’s own organization.

The chairperson informs all review participants of the date, time, and location of the review. Ten days before the meeting, the chairperson distributes to each review participant all documents (e.g., requirements document, preliminary plans, other review material) produced by the requirements definition and analysis teams. In preparation for the meeting, each reviewer critically reviews the documents. The user representative is puzzled over the inclusion of a requirement concerning the use of a proposed database. The reviewer from the outside computing organization notes that the version of the operating system to be used in developing the software is outdated. A representative of the client organization has a question concerning the use of a subcontractor in one phase of the project. Each reviewer submits comments to the chairperson before the scheduled review meeting. The chairperson directs each comment to the appropriate requirements team member to allow proper time for responses to be prepared.

The requirements review meeting begins with a brief introduction by the chairperson. All participants are introduced, review materials are listed, and the procedure for conducting the review is presented. A presentation is then given summarizing the problem that led to the requirements and the procedure that was used to define these requirements. At this time, the user representative inquires about the data base requirement as stated in the requirements document. The project technical leader responds to this question. The user representative accepts this response, which is so noted by the recorder in the official minutes of the meeting.

The meeting continues with an analysis of the requirements and a description of the contractor's planned approach for developing a solution to the problem. At this time, the questions from the client representative and the outside computing organization are discussed. The project manager responds to questions concerning the use of a subcontractor on the project. Certain suggestions have been made that require the approval of the subcontractor and these are placed on the action list. The technical leader acknowledges the problems that the independent computing organization has pointed out and notes that certain software vendors must be contacted to resolve them. This item is also placed on the action list. A general discussion follows among all review team members.

At the end of the review, the chairperson seeks a decision from the reviewers about the acceptability of the requirements document. They agree to give their approval, providing that the suggestions noted on the action list are thoroughly investigated. All participants agree to this decision and the meeting is adjourned.

The chairperson distributes a copy of the minutes of the meeting, including action items, to all participants. The project manager informs the subcontractor of the suggestions made at the meeting. The subcontractor subsequently agrees with the suggestions. The project technical leader contacts the software Vendor from which the current operating system was purchased and learns that the latest version can easily be installed before it is needed for this project. The technical leader notifies the project manager of this, who subsequently approves its purchase. The requirements document is revised to reflect the completion of these action items. The chairperson verifies that all action items have been completed. The project manager submits a management report to management, summarizing the review.

Effectiveness. Because the cost to correct an error increases rapidly as the development process progresses, detection of computational and logic errors by the use of formal reviews is an attractive prospect.

Some qualitative benefits attributable to the use of formal reviews are:

· Highly visible systems development.

· Early detection of design and analysis errors.

· More reliable estimating and scheduling.

· Increased product reliability and maintainability.

· Increased education and experience of all individuals involved in the process.

· Increased adherence to standards.

· Increased user satisfaction.

Little data is available that identifies the quantitative benefits attributable to the use of formal reviews.

This technique is most effective on large projects. The costs of performing formal reviews on small projects, however, may be sufficiently large to warrant lessening the formality of the reviews or even eliminating or combining some of them.

Applicability. Formal reviews are applicable to large or small projects during design verification and unit test phases and are not limited by project type or complexity.

Maturity. This is a widely used technique.

User Training. This technique does not require any special training. The success or failure of a formal review, however, depends on the personnel who attend. They must be intelligent, skilled, knowledgeable in a specific problem area, and able to interact effectively with other team members. The experience and expertise of the individual responsible for directing the review are also critical to the success of the effort.

Costs. The method requires no special tools or equipment. The main cost involved is staff time, which is negligible compared with the expected returns. Formal review meetings should not require more than one to two hours. Preparation time can amount to as little as half an hour and should not require longer than half a day per review.

ERROR AND ANOMALY DETECTION TECHNIQUES

Testing techniques that deal with syntax checking (e.g., uninitialized variables, unreachable code), coding-standards auditing, and data checking (e.g., set-use, data flow, unit consistency) are described in this section.

Code Auditing

Code auditing involves examining the source code and determining whether prescribed programming standards and practices have been followed.

Information Input. The information input to code auditing is the source code to be analyzed.

Information Output. The information output from code auditing is a determination of whether the code under analysis adheres to prescribed programming standards. If errors exist, a description of which standards have been violated and where the violations occur is generated as error messages included with a source listing or as a separate report. Other diagnostic information (e.g., a cross-reference listing) may also be output as an aid to making error corrections.

Outline of Method. Code auditing provides an objective, reliable means of verifying that a program complies with a specified set of coding standards. Some common programming conventions that code auditing can check for are:

· Correct syntax - Do all program statements conform to the specifications of the language definition?

· Portability - Is the code written so that it can easily operate on different computer configurations?

· Use of structured programming constructs - Does the code make proper use of a specified set of coding constructs (e.g., IFTHENELSE or DOWHILE)?

· Size - Is the length of any program unit confined to a specified number of statements?

· Commentary - Is each program unit appropriately documented? For example, does a block of comments that indicate the function of the unit and the function of each variable used precede each unit?

· Naming conventions - Do the names of all variables, routines, and other symbolic entities follow prescribed naming conventions?

· Statement labeling - Does the numeric labeling of statements follow an ascending sequence throughout each program unit?

· Statement ordering - Do all statements appear in a prescribed order? For example, in a FORTRAN program, do the data specification statements appear before the first executable statement of a routine and all FORMAT statements appear at the end?

· Statement format - Do all statements follow a prescribed set of formatting rules that improve program clarity? For example, are the DOWHILE loops appropriately indented?

As demonstrated by this list, code auditing may vary in sophistication according to function. Each form, however, requires some kind of syntax analysis. Code must be parsed and given an internal representation suitable for analysis. Because this type of processing is found in many static analysis tools, code auditing may be part of a more general tool having many capabilities. For example, a compiler is a form of code auditing that checks for adherence to the specifications of a language definition. PFORT, a tool used to check FORTRAN programs for adherence to a portable subset of American National Standard (ANS) FORTRAN, also has the capability of generating a cross-reference listing.

Code auditing is useful to programmers as a means of self-checking their routines before they are turned over for integration testing. These tools are also of value to quality assurance personnel during integration testing, before formal validation testing, and again before customer delivery.

Example. An example of code auditing follows.

Application. A flight control program is to be coded entirely in PFORT, a portable subset of ANS FORTRAN. The program is to be delivered to a government military agency that installs the software on various computers. In addition, the customer requires that each routine in the program be clearly documented in a prescribed format. All internal program comments are to be later compiled as a separate source of documentation for the program.

Error. A named common block occurs in several routines in the program. In one routine, the definition of a variable in that block has been omitted because the variable is not referenced in that routine. This is a violation of a rule defined in PFORT that requires that the total length of a named common block agree in all occurrences of that block.

Error Discovery. A code auditor that checks FORTRAN for adherence to PFORT detects this error immediately. The programmer is informed that the routine is to be appropriately modified and that any confusion over the use of the variable is to be clarified in the block of comments that describe the function of each defined variable in the routine. A code auditor that checks for the presence of appropriate comments in each routine is used to verify that the use of the variable is appropriately documented. At the end of code construction, all such internal program documentation is collated and summarized by another code auditor that processes machine-readable documentation embedded in source code.

Effectiveness. Code auditing is very effective in certifying that software routines have been coded in accordance with prescribed standards. Code audits are much more reliable than manually performed code audits and are highly cost-effective because they are less time-consuming than manual audits.

Applicability. Code auditing can be applied generally to any type of source code and is applicable during design verification through various test phases. Each specific tool is language dependent (i.e., operates correctly only for specified source languages), and only accepts input that appears in a prescribed format.

Maturity. Code auditors have been used with favorable results and have been extensively automated. Code auditing is a highly mature technique.

User Training. No special training is required to implement this technique. Because code auditing may be implemented by a wide variety of personnel (e.g., programmers, managers, quality assurance personnel, and customers), ease of use is an important attribute. To use this technique effectively, however, familiarity with the standards on which the auditing is based is required.

Costs. Code auditing is generally inexpensive; the overhead is usually no more than the cost of a compilation.

Interface Checking

Interface checking involves analyzing the consistency and completeness of the information and control flow between system components, modules, or procedures.

Information Input. Information needed to do interface checking consists of:

· A formal representation of system requirements.

· A formal representation of system design.

· A program coded in a high-level language.

Information Output. The output information of this technique describes module interface inconsistencies and errors. The information can be provided as error messages included with a source listing or as a separate report.

Outline of Method. Interface checking analyzes a computer-processable form of a software system requirements specification, design specification, or code. The method for each of the three representations-requirements, design, and code—is illustrated in the following sections through an examination of the interface checking capabilities of three existing tools. (These three tools were chosen for explanatory purposes; their mention should not be regarded as a recommendation.)

PSL/PSA (Problem Statement Language/Problem Statement Analyzer) is an automated requirements specification tool. Basically, PSWPSA describes requirements as a system of input, processes, and output. Both information and control flow are represented within PSL. Interface checking performed by PSA consists of ensuring that all data items are used and generated by a process and that all processes use data. Incomplete requirements specifications are therefore easily detected.

The Design Assertion Consistency Checker (DACC) is a tool that analyzes module interfaces based on a design that contains information describing for each module the nature of the input and output. This information is specified through the use of assertions to indicate the number and order of input, data types, units (e.g., feet or radians), and acceptable ranges. DACC checks module calls as specified in the design against the assertions for the called module for consistency and produce an inconsistency report when assertions have been violated.

PFORT is a static analysis tool that is used primarily for checking FORTRAN programs for adherence to a portable subset of the FORTRAN language, but it also performs subprogram interface checking. PFORT checks the matching of actual with dummy arguments, checking for unsafe references (e.g., constraints being passed as arguments), which are subject to change within the called subprogram.

Interface checking capabilities can also be included with a particular language compiler as well. For example, Ada provides a parameter-passing mechanism whereby parameters are identified to be input or output or input/output. Moreover, data type and constraints (e.g., range and precision) must match between the actual arguments and the formal parameters (in nongeneric subprograms).

Example. An example of interface checking is presented in this section.

Application. A statistical analysis package written in FORTRAN uses a file access system to retrieve records containing data used in the analysis.

Error. The primary record retrieval subroutine is always passed as the last argument in the argument list of the subroutine call. A statement number in the calling program is to receive control in case an abnormal file-processing error occurs. One program, however, fails to supply the needed argument. The compiler is not able to detect the error. Moreover, the particular FORTRAN implementation is such that no execution time error occurs until a return to the unspecified statement number is attempted, at which time the system crashes.

Error Discovery. This error can easily be detected through the use of an interface checker at either the design (e.g., DACC) or coding (e.g., PFORT) phase of software development. Both DACC and PFORT can detect incorrect numbers of arguments.

Effectiveness. Interface checking is effective at detecting a class of errors that can be difficult to isolate if left to testing. This technique generally checks for:

· Modules that are used but not defined.

· Modules that are defined but not used.

· Incorrect number of arguments.

· Data type mismatches between actual and formal parameters.

· Data constraint mismatches between actual and formal parameters.

· Data usage anomalies.

Interface checks are generally more cost-effective if provided as a capability within another tool (e.g., as a compiler, data flow analyzer or a requirements/design specification tool).

Applicability. Interface checking is applicable during algorithm confirmation through integration test and is independent of application type and size.

Maturity. This technique is widely used with the aid of automated tools (e.g., DACC and PFORT).

User Training. This technique requires only minimal effort to learn.

Costs. Interface checking can be quite inexpensive, usually much less than the cost of a compilation.

Physical Units Checking

Many scientific, engineering, and control programs perform computations whose results are interpreted in terms of physical units (e.g., feet, meters, watts, and joules). Physical units checking enables specification and checking of units in program computations in a manner similar to dimensional analysis. Operations between variables that are not commensurate (e.g., adding gallons and feet) are detected.

Information Input. Units checking requires three things to be specified within a program: the set of elementary units used (e.g., feet, inches, acres), relationships between the elementary units (e.g., one foot = 12 inches, one acre = 43,560 square feet), and the association of units with program variables. The programming language must support such specifications, or the program must be preprocessed by a units checker.

Information Output. The information output depends on the specific capabilities of the language processor or preprocessor. At a minimum, all operations involving variables that are not commensurate are detected and reported. If variables are commensurate but not identical--that is, they are the same type of quantity (e.g., units of length) but one requires application of a scalar multiplier to place it in the same units as the other--the system may insert the required multiplication into the code or may only report what factor must be applied by the programmer.

Outline of Method. The specification of the input items is the extent of the actions required by the user. Some systems may allow the association of a units expression with an expression within the actual program. Therefore, the programmer may write LOTSIZE = (LENGTH * WIDTH = feet2) as a Boolean expression, where the product of length and width must equal units of square feet. The process of ensuring that length times width is in square feet is the responsibility of the processing system.

Example. A short program in Pascal-like notation is shown for computing the volume and total surface area of a right circular cylinder. The program requires as input the radius of the circular base and the height of the cylinder. Because of peculiarities in the usage environment of the program, the radius is specified in inches, the height in feet; volume is required in cubic feet, and the surface area in acres. Several errors are present in the program, all of which would be detected by the unit checker.

In the following section, comments are made explaining the program, the errors it contains, and how they would be detected. The comments are keyed by line number to the program.

· Line 2 - All variables in the program that are quantities are expressed in terms of these basic units.

· Line 3 - These are the relationships between the units known to the units checker.

· Lines 5 through l0 - Variable radius is in units of inches, height is in units of feet, and so forth.

· Line 12 - input values are read into variables radius and height.

· Line 13 - lateral surface must be expressed in square feet. (Radius/l2) is in feet and can be so verified by the checker.

· Line 15 - lateral-surface and top-surface are both expressed in square feet, therefore their sum is in square feet also. Area is expressed in acres, however, and the checker issues a message that although the two sides are commensurate, the conversion factor of 43,560 was omitted from the right side of the assignment.

· Line 16 - The checker detects that the two sides of the assignment are not commensurate. The right side is in units of feet⁴, the left is in feet³.

(1) program cylinder (input, output);

(2) elementary units inches, feet, acre;

(3) units relationships feet = 12 inches; acre = 43,560 feet2;

(4) constant pi = 3.1415927

(5) var radius (*inches*),

(6) height(*feet*),

(7) volume (*feet3*),

(8) area(*acre*),

(9) lateral-surface (*feet2*),

(10) top-surface (*feet2*): real;

(11) begin

(12) read (radius, height);

(13) lateral-surface:= 2* pi*(radius/l 2)*height:

(12) top-surface:= pi* (radius/l2)2

(13) area:= lateral_surface + 2* top_ surface

(14) volume:= pi *(radius3 *height);

(15) write (area, volume);

(16) end;

Effectiveness. The effectiveness of units checking is limited only by the capabilities of the units processor. Simple unit checkers may be able to verify that two variables are commensurate but not to determine whether proper conversion factors have been applied. That is, such a relationship as 12 inches = feet may not be fully used in checking the computations in a statement (e.g., line 13 of the example). The assertion was made there that (radius/l2) would be interpreted as converting inches to feet. The checker may not support this kind of analysis, however, to avoid ambiguities with such expressions as "one-twelfth of the radius."

Applicability. Certain application areas (e.g., engineering and scientific) often deal with physical units. In others, however, it may be difficult to find analogies to physical units. In particular, if a program deals only in one type of quantity (e.g., dollars) the technique would not be useful. Units checking can be performed during unit and module testing.

Maturity. This technique has been tested and is widely used.

User Training. Dimensional analysis is commonly taught in first-year college physics on statics; conversion from English to metric units is a simple procedure. Direct application of these principles in programming, with the use of a units checker, should require no additional training beyond understanding the capabilities of the specific units checker and the means for specifying units-related information.

Costs. If incorporated directly in a compiler, the cost of the units checking capabilities should be negligible. If a preprocessor is used, such systems are typically much slower than a compiler (perhaps operating at one-tenth of compilation speed), but only a single analysis of the program is required. The analysis is repeated only when the program is changed.

Data Flow Analysis

Data now analysis determines the presence or absence of errors that can be represented as particular sequences of events in a program's execution. This anomaly detection technique is not to be confused with data-flow guided testing, a specialized test technique used primarily for compiler design and optimization.

Information Input. Data flow analysis algorithms operate on annotated graph structures that represent the program events and the order in which they can occur. Specifically, two types of graph structures are required: a set of annotated flow graphs and a program invocation (or call) graph. There must be one annotated flow graph for each of the program's procedures. The flow graph's nodes represent the execution units (usually statements) of the procedures, and whose edges are used to indicate which execution units may follow which other execution units. Each node is annotated with indications of which program events occur as a consequence of its execution. The purpose of the program invocation (call) graph is to indicate which procedures can invoke which others. Its nodes represent the procedures of the program, and its edges represent the invocation relation.

Information Output. The output of data flow analysis is a report on the presence of any specified event sequences in the program. If any such sequences are present, the identity of the sequence is specified with a sample path along which the illegal sequence can occur. The absence of any diagnostic messages concerning the presence of a particular event sequence is a reliable indicator of the absence of any possibility of that sequence.

Outline of Method. Data flow analysis relies basically on algorithms from program optimization to determine whether any two particular specified events can occur in sequence. Taking as input a flow graph annotated with all events of interest, these algorithms focus on two tasks and determine whether there exists a program path along which the two occur in sequence and whether on all program paths the two must occur in sequence. To determine illegal event sequences of length three or more, these basic algorithms can be applied in succession.

A major difficulty arises in the analysis of programs having more than one procedure, because the procedure flow graphs often cannot be completely annotated before data flow analysis. Flow graph nodes representing procedure invocations must be left either partially or completely unannotated until the flow graphs of the procedures they represent have been analyzed. Therefore, the order of analysis of the program's procedures is critical. This order is determined by a postorder traversal of the invocation graph, in which the bottom-level procedures are visited first, then those that invoke them, and so forth until the main-level procedure is reached. For each procedure, the data flow analysis algorithms must determine the events that can possibly occur both first and last and then make this information available for annotation of all nodes representing invocations of this procedure. Only in this way can it be ensured that any possible illegal event sequence is determined.

Example. The standard example of the application of data now analysis is the discovery of references to uninitialized program variables. In this case, the program events of interest are the definition of a variable, the reference to a variable, and the undefinition of a variable. Therefore, all procedure flow graphs are annotated to indicate which specific variables are defined, referenced, and undefined at which nodes. Data flow analysis algorithms are then applied to determine whether the undefinition event can be followed by the reference event for any specific variable without any intervening definition event for that variable. If so, a message is produced, indicating the possibility of a reference to an uninitialized variable and a sample program path along which this occurs. A different algorithm is also used to determine whether for a specific variable undefinition must, along all paths, be followed by reference without intervening definition. For invoked procedures, these algorithms are also used to identify which of the procedure's parameters and global variables are sometimes used and always used as input and output. This information is then used to annotate all nodes representing the invocation of this procedure to enable analysis of the higher-level procedures.

Data flow analysis might also be applied to the detection of illegal sequences of file operations in programs written in such languages as COBOL. Here the operations of interest would be opening, closing, defining (i.e., writing), and referencing (i.e., reading) a file. Errors whose presence or absence could be determined would include:

· Attempting to use an unopened file.

· Attempting to use a closed file.

· Reading an empty file.

Effectiveness. As noted, this technique is capable of determining the absence of event sequence errors from a program or their presence. When an event sequence error is detected, it is always detected along some specific path. Because these techniques do not study the executability of paths, the error may be detected on an unexecutable path and give rise to a spurious message. Another difficulty is that this technique is not reliable in distinguishing among the individual elements of an array. Therefore, arrays are usually treated as if they were simple variables. As a consequence, illegal sequences of operations on specific array elements may be overlooked.

Applicability. Data flow analysis is applicable during unit and module testing phases. The applicability of this technique is only limited by the availability of the (considerable) tools and techniques needed to construct such flow and call graphs.

Maturity. This technique has proved very effective and efficient and is rapidly appearing as a capability in many state-of-the-art-testing tools.

User Training. This technique requires only a familiarity with and understanding of the output message. No input data or user interaction is required.

Costs. This technique requires computer execution. The algorithms employed, however, are highly efficient, generally executing in time that is linearly proportional to program size. Experience has shown that the construction of the necessary graphs can be a considerable cost factor.

Because no human input or interaction is required except for interpretation of results, staff time costs are low.

STRUCTURE ANALYSIS AND DOCUMENTATION

This section contains a description of a test technique that detects errors concerning the control structures (e.g., modules, subroutines, branches) and code structures (e.g., iterations) of a language. In addition, this section contains descriptions of various reports that are generated as a by-product of static analysis.

Structure Analysis

Application of structure analysis to either code or design allows detection of some types of improper subprogram use and violation of control flow standards. Structure analysis is also useful in providing required input to data flow analysis tools and is related in principle to two code auditing techniques.

Information Input. Two input items are required by structure analysis. The first is the text of the program or design to be analyzed. The text is to be provided to the analyzer in high-order language source code, but in some cases in an intermediate form (i.e., after scanning and parsing but not as object code).

The second input item is a specification of the control flow standards to be checked. These standards are often completely implicit in that they may be part of the rules for programming in the given language or design notation. An example of such a rule is that subprograms may not be called recursively in FORTRAN. Individual projects, however, may establish additional rules for internal use. Many such rules (e.g., limiting the number of lines allowed in a subprogram) can be checked by code auditing. Others, however, can require a slightly more sophisticated analysis and are therefore performed by structure analysis. Two examples in this category are "all control structures must be well nested" and "backward jumps out of control structures are not allowed."

Information Output. Error reports and a program call graph are the most common output items of structure analysis. Error reports indicate violations of the standards supplied for the given input. Call graphs indicate the structure of the graph with respect to the use of subprograms; associated with each subprogram is information indicating all routines that call the subprogram and all routines that are called by it. The presence of cycles in the graph (A calls B calls A) indicates possible recursion. Routines that are never called are evident, as well as attempts to call nonexistent routines. In addition, the presence of structurally dead code is identified.

In checking adherence to control flow standards, the technique provides a flow graph for each program unit. The flow graph represents the structure of the program with each control path in the program represented by an edge in the graph.

The flow graph and the call graph are items required as input by data flow analysis, and it is common for the two analysis capabilities to be combined in a single automated tool.

Outline of Method. Because structure analysis is an automatable static analysis technique, little user action is required. Aside from providing the input information, the user need only study the output reports and determine whether program changes are required. Some simple manifestations of the tool may not provide detailed analysis reports and therefore rely on the user to examine the input (e.g., the call graph for the presence of cycles).

Example. As an example of a structural implementation error, an online management information system program calls a routine MAX to report the largest stock transaction of the day for a given issue. If MAX does not have the necessary information already available, RINPUT is called to read the required data. Because RINPUT reads many transactions for many issues, a sort routine is used to help organize the information before returning it to the caller. As a result of a keypunch error, the sort routine calls routine MAX (instead of MAX1) to help in the sorting process. This error can be detected as a cycle in the call graph (see Figure 2) or by locating all disconnected flow graph components; it is reported through the use of structure analysis.

An example of a violation of control flow standards is as follows. As part of the programming standards formulated for a project, the following rule is adopted: all jumps from within a control structure must be to locations after the end of the structure. The following segment of Pascal contains a violation of this rule that would be reported:

1OO:X: =100;

whileX:<70 do

begin

if Z = 5 then go to 100;

end;

Effectiveness. Structure analysis is completely reliable for detecting structure errors and violations of the standards specified as input. The standards, however, only cover a small range of programming standards and possible error situations. Therefore, this technique is useful only in verifying extremely basic program properties. This technique's prime utility is during the early stages of debugging a design or code specification to detect unreachable code, infinite loops, and recursive procedure calls; it can also provide helpful documentation on the procedure-calling structure of large programs.

Figure 2. A Call Graph

Applicability. This technique may be applied from algorithm confirmation through integration test. Particular applicability is indicated in systems involving large numbers of subprograms or complex program control flow.

Maturity. Structure analysis is automated and has been shown to be effective in detecting structural errors and violations of specified standards.

User Training. Minimal training is required for use of the technique.

Costs. Little staff time is involved because no significant time is spent in preparing the input or interpreting the output. Computer resources are small because the processing required can be done efficiently and only a single run is required for analysis.

Documentation

A by-product of many static analysis techniques are reports produced by automated testing tools. This information can be used to help document the program being analyzed. The following are descriptions of typical reports:

· Global cross-reference report indicating input/output usage for variables in all modules.

· Module invocations report indicating the calling modules and showing all calling statements.

· Module interconnection report showing the program's module-calling structure.

· Special global data reports for variables in global areas (e.g., COMMON blocks and COMPOOLs).

· Program statistics, including total size, number of modules, module size distribution, statement type distribution, and complexity measures.

· Summaries of analysis performed, program statistics, and errors and warnings reported.

PROGRAM QUALITY ANALYSIS

This section contains descriptions of complexity ratings and quality measurements. To fully explain the methodology of these topics is beyond the scope of this appendix. A brief description is supplied here as well as a list of references.

Halstead's Software Science

Halstead's software science involves several measures of software quality. He combines the disciplines of experimental psychology and computer science into a theory of software complexity. Halstead's theory uses four quantities: simple counts of operands and operators and the total frequencies of operands and operators. With these quantities, Halstead has developed measures for the overall program length, potential smallest volume of an algorithm (i.e., the most succinct form in which an algorithm could ever be expressed), actual volume of an algorithm in a particular language program level (i.e., the difficulty of understanding a program), language level (i.e., a measure of the relative ease of encoding an algorithm in a specific source language), programming effort (number of elementary discriminations necessary to encode a specific algorithm), program development time, and prediction of bug rates in a system.

Many believe that the Halstead theory on program complexity contains theoretical and practical weaknesses. Despite these opinions, Halstead metrics has been applied with positive results. Hamer and Frewin, however, claim that the "experimental support is largely illusory."

Information Input. The input to Halstead's measure is the source code.

Information Output. The output of Halstead's measure varies depending on the desired analysis.

Outline of Method. Computing the various measures (metrics) requires counting the number of operands and operators in the algorithm or source code, depending on the metric. This vagueness in actually classifying and counting operators and operands has caused much confusion in interpreting the results of various researchers.

At present, researchers select counting rules that work in their environment; no consistent rules work in all environments. Some metrics seem stable, almost insensitive to the counting rules used to compute them; others vary widely with even minor counting rule changes.

Halstead's book is notably vague on practical implementation of the theory, and Halstead died before these questions were resolved. For specific information on methodology, it is necessary to examine the work of others who have applied Halstead's metrics.

Example. This section demonstrates how the difficulty-level metric is applied to a simple program.

The difficulty-level metric asserts that a program with a high level of difficulty is likely to be more difficult to construct and therefore is likely to be error-prone. The equation for this metric is:

L=(2/nl)*(n2/N2)

where:

L = the level of program difficulty

nl = number of unique or distinct operators

n2 = number of unique or distinct operands

N2 = total use of all the operands

This equation is applied to the following program:

C COMPUTE FIBONACCI SEQUENCE

C M1 AND M2 ARE CONSECUTIVE TERMS

M1 = -1

M2 = 1

DO 10 KTR = 1, 30, 2

M1 = M1 + M2

M2 = M1 + M2

PRINT, ‘(‘, KTR, ‘) ‘, M1

K = KTR + 1

PRINT, ‘(‘ K, ‘)” , M2

10 CONTINUE

Table 1 contains the operator and operand counts for the previous program. Applying this data into the level of difficulty equation results in:

L = O.11

Operator and Operand Counts

Operator	j	f(1, j)	Operand	j	f(2, j)
–	1	6	M1	1	5
DO	2	1	M2	2	5
+	3	3	KTR	3	3
PRINT	4	2	K	4	2
CONTINUE	5	1
	n1 = 5	N1 = 13		n2 = 4	N2 = 15

Table 1

where:

f(l,j) = number of occurrences of the jth most frequently occurring operator, where j = 1, 2,..., nl

f(2,j) = number of occurrences of the jth most frequently used operand, where j = 1, 2,..., n2

In conclusion, the Fibonacci program is rated a relatively low level of difficulty (.11); therefore, it follows that the program is simple to construct and is not likely to be error-prone. This is a plausible conclusion given the sample program.

Effectiveness. A subset of Halstead’s measures has been shown to be qualitatively effective. A major difficulty with this technique is that no standards are imposed on the counting rules. Research in this area will yield a more practical and usable methodology.

Halstead's theory has been the subject of considerable evaluation research with such measures as predicting the number of bugs in a program, the time required to implement a program, debugging time, and algorithm purity.

Applicability. A subset of the various metrics Halstead has proposed has been applied with positive results. Future use of Halstead's metrics should center on those metrics previously studied to provide a methodology and guideline to follow with respect to counting rules. This technique is applicable during unit and module test phases.

Maturity. Halstead's software science is still in the developmental and experimental stages because of its theoretical weaknesses and the lack of standardized counting rules. The technique has been used with varying results.

User Training. To apply Halstead's metrics, knowledge of previous researchers' works is necessary to gain an understanding of the methodology.

Costs. This technique is laborious to implement unless it is automated. The counting of operators and operands is a tedious and time-consuming task.

McCabe's Cyclomatic Number

McCabe's cyclomatic number measures the complexity of a source code module by examining the control (e.g., IFTHENELSE, DOWHILE, DOUNTIL, CASE statements). It provides a concise means to gauge the number of paths and size of a module of code and therefore to locate error-prone sections of code.

Information Input. The source code or a detailed design document containing identifiable control flow statements can be the input to this technique.

Information Output. The output is the cyclomatic number, which indicates the complexity of the code. Tools can be used that would provide other insights into the complexity of the code (e.g., call graphs).

Outline of Method. The methodology of this technique involves counting the number of edges, vertices, and connected components of a graph of the program code. The cyclomatic number is then defined as:

V(G) = e - n + 2p

where:

e = number of edges

n = number of vertices

p = number of connected components

FIGURE 3. Control Graphs of Language Structures

Figure 3 shows control graphs of constructs found in structured programming. The edges are a count of the lines, the vertices a count of the circles and decision symbols. The connected components, p, is greater than 1 when the complexity measure is applied to a collection of programs. For example, if a main program, M, calls subroutines A and B, the control structures would look like Figure 4.

FIGURE 4. Control Structures

McCabe has suggested that an upper bound of V(G) I 10 should be maintained for a module of code. Because the cyclomatic number is based on the examination of control flow, it is a measure of the number of paths through a module of code. By maintaining V(G) at 10 or less, control of the number of paths through a module is gained and module testing is facilitated. When V(G) > 10, the number of paths through the code should be reduced. The control graph provides a visual picture of the code structure; therefore, it also is helpful in determining where to break up the code.

Two methods to simplify the complexity calculation have been proven. Assuming p = 1, V(G) = Pi + 1, where Pi = the number of predicates (condition statements). This method requires a count of the predicates in the code. It is not necessary to extract a control graph from the code; however, it is a helpful aid. The second method involves a visual inspection of the control graph. Given a control graph of a program, a count of the regions is equal to V(C).

All three of the methods to calculate the complexity of a program are useful. In some cases, it is simpler to count the number of predicates; in other cases, the regions or the edges, vertices, and connected components.

Example. Given the following example, a count of the number of predicates is the simplest method to calculate the complexity number. This method was applied to a quadratic equation solver program.

No. of Predicates	Program Code
0	C AX2+BX+C=0
0	C QUADRATIC EQUATION SOLVER
0	READ, A, B, C
1	IF (A.EQ.0.0) THEN
2	IF (B.EQ.0.0) THEN
3	IF (C.EQ.0.0) THEN
0	PRINT, ‘TRIVIAL’
0	ELSE
0	PRINT, ‘IMPOSSIBLE’
0	END IF
0	ELSE
0	X = –C/B
0	PRINT, ‘SINGLE ROOT’
0	PRINT, X
0	END IF
0	ELSE
0	DISC = B*2–4.0A*C
4	IF (DISC.GT.0.0) THEN
0	PRINT, ‘REAL ROOTS’
0	X1 = (–B+SQRT(DISC))/(2.0*A)
0	X2 = (–B–SQRT(DISC))/(2.0*A)
0	PRINT, X1, X2
0	END
5	IF (DISC.EQ.0.0) THEN
0	PRINT, ‘MULTIPLE ROOT’
0	X = –B/(2.0*A)
0	PRINT, X
0	END
6	IF (DISC.LT.0.0) THEN
0	PRINT, ‘COMPLEX ROOTS’
0	X = B/2(2.0*A)
0	Y = SQRT(ABS(DISC))/(2.0*A)
0	PRINT, ‘X’, ‘+’, ‘Y’, ‘I’
0	Y = –Y
0	PRINT, ‘X’, ‘+’, ‘y’, ‘I’
0	END
0	END

Table 2

The cyclomatic number, V(G), is defined as:

V(G) = Pi +i

where Pi = the number of predicates. This equation applied to the example yields V(G) = 7. Because V(G) for this example is less than 10, the program does not need modification. In other words, the complexity of the program is not great; the number of paths through the code is considered manageable in the context of testing.

Effectiveness. The cyclomatic number is effective in determining and controlling the number of program paths in a module of code. There is growing evidence that a direct relationship exists between code complexity and the number of errors in a segment of code and the time to find and repair such errors. The cyclomatic number therefore facilitates manageable testing of the code.

Applicability. The cyclomatic number can be applied to high-order languages that allow easy detection of control structures (e.g., IF, DOUNTIL, DOWHILE, CASE). When used simultaneously with the coding phase of the software development life cycle, the metric provides an effective way to gauge the size of a module.

Maturity. McCabe's complexity measure has progressed beyond the developmental and experimental stage and is widely used. Although there is an indication of a positive relationship with cost and number of errors, this relationship has not been quantitatively studied.

User Training. If this technique is automated, there is little work for the user other than interpreting the results. A basic understanding of McCabe's theory is necessary.

Costs. If this technique is not automated, the code must be visually inspected for control structures. Depending on the amount of code to be examined, this could be time consuming.

Software Quality Measurement

Software quality measurement is a method of assessing the quality of software (e.g., reliability, maintainability, testability, correctness). The metrics are used to measure the presence of the desired qualities at key review points during the development process to allow periodic prediction of the quality level for the final product.

Information Input. The necessary input is all tangible software development products (e.g., specifications, standards, code).

Information Output. The output of this technique consists of a compliance document, which presents the relationship between the metric scores and the specified requirements. This document is presented at key review points and provides a picture of the software quality trend over time.

Outline of Method. There are two ways to implement software quality metrics. This first method is to apply the metrics during the full-scale development of a product. An outline of this methodology is as follows:

· Select quality factors--Software quality metrics can be explained by the framework in Figure 5. Software quality factors are user-oriented goals (e.g., reliability, maintainability). These factors could be required by contract or a result of some project goal.

· Select factor attributes Depending on the particular system (e.g., uniprocessor, distributed), only a subset of the software factor attributes applies.

· Establish quantifiable goals--This step requires establishing realistic goals (e.g., a reliability level of 0.98).

The other steps in this methodology are described under the second method. This method evaluates a product that has been completed. The following steps are applied regardless of the method chosen:

· Select the software products--Obtain the input (i.e., documents and code).

· Select and fill out metric worksheets, which contain questions concerning a specific software criterion. For example, a metric that measures the simplicity criterion asks whether the design is organized in a top-down hierarchical fashion. The metric worksheets are applied to the software products or input as previously listed.

· Select score sheets and score elements, metrics, and criteria factors. Metric worksheets are the input to the score sheets. Score sheets combine the metrics to form criteria scores; the criteria scores are then combined to form a factor score (see Table 3).

· Perform data analysis and generate report-Evaluate the scoring for trends in the data. Unusual scores should be checked for reasonableness at this point.

· Analyze results--The results indicate whether any corrective actions must be taken.

Figure 5. Software Quality Metrics Framework

Example. The methodology can be broken down into a specification phase and an evaluation phase. The first phase involves selecting quality factors and attributes and establishing quantifiable goals. The second phase involves scoring the data within the framework of the specifications and analyzing the results.

Specification Phase. The software quality factor reliability is selected for this example. The definition of reliability is the probability that the software performs its logical operations in the specified environment without failure. The attributes or criteria associated with the quality factor reliability and the metrics associated with the criteria are listed in Table 4. A quality' goal of 0.90 is chosen.

Evaluation Phase. The next step of the evaluation process involves scoring the metrics, criteria, and quality factor. This information is given in Table 3. Because the measured score for reliability (.83) is less than the desired reliability level of 0.90, corrective actions must be taken. A glance at the data given in Table 3 indicates that there are two suspect values: AM.5 and SI.2. These unusual scores could result from incorrect metric data or from a poor-quality product (software or documentation). If the metric data is correct, either the product must be improved or the overall quality goal is unrealistic and must be lowered.

Effectiveness. Software quality metrics is an emerging technology and has had little field use to validate its effectiveness.

Applicability. This technique is applicable to a product under development, from algorithm confirmation to verification/CPCI test and to an existing product. Software quality measurement could possibly be applied to the remaining test phases, depending on the particular needs of the user. The metrics are primarily applicable to high-order languages.

Maturity. Evaluation of software quality through metrics is new; therefore, it is not widely used.

User Training. An implementor of this technique should be familiar with software design development and coding.

Costs. Emphasis on a high software quality goal can sometimes result in cost savings. For example, using metrics to evaluate reliability results in early detection of errors and, therefore, cost savings. The benefits of some quality factors may not be realized until later in the life cycle (e.g., reusability). An emphasis on reusability will aid future projects.

Quality Factor Reliability Criteria

Criteria	Definition
Accuracy (AC)	Those characteristics of the software that provide the required precision in calculations and outputs. The metric associated with accuracy is AC.I-Accuracy Checklist.
Anomaly Management (AM)	Those characteristics of the software that provide for continuity of operations under and recovery from nonnominal conditions. The metrics associated with anomaly management are: AM.1 – Error Tolerance/Control Checklist AM.2 – Improper Input Data Checklist AM.3 – Computational Failures Checklist AM.4 – Hardware Faults Checklist AM.5 – Devise Errors Checklist AM.6 – Communication Errors Checklist AM.7 – Node/Communication Failures Checklist
Simplicity (SI)	Those characteristics of the software that provide for the definition and implementation of functions in the most noncomplex and understandable manner. The metrics associated with simplicity are: SI.1 – Design Structure Measure SI.2 – Structured Language or Preprocessor SI.3 – Data and Control Flow Complexity Measure SI.4 – Coding Simplicity Measure SI.5 – Specify Measure SI.6 – Halstead’s Level of Difficulty Measure

Table 12

Metric, Criteria, and Quality Factor Scores

Metric Name	Metric Score	Criteria Score	Factor Score
AC.1	.87	.87
AM.1	.85
AM.2	.94
AM.3	.98
AM.4	.82	.79	.83
AM.5	.26
AM.6	.82
AM.7	.89
SI.1	.89
SI.2	.35
SI.3	.97	.33
SI.4	.89
SI.5	1.00
SI.6	.85

Table 4

INPUT SPACE PARTITIONING

This section contains descriptions of path analysis test techniques. These techniques are characterized by the partitioning of the input space into path domains (subsets of the program input domain) that cause execution of the program's paths. These techniques have been shown to be generators of high-quality test data, although current technology limits their use to programs that have a small number of input variables. They are not suitable for widespread use because of the complexity of the methodology; however, less rigorous interpretations of the methodologies are generally used.

Path Analysis

Path analysis generates test data that causes selected paths in a program to be executed. This method presents a procedure for selecting a representative subset of the total set of input data of a program.

Information Input. The input needed for this technique is the source code.

Information Output. The output of this technique is a set of test data that is sensitive to computation errors, path errors, and missing path errors.

Outline of Method. The methodology of path analysis consists of five phases.

Phase I. The program is analyzed and descriptions of the standard classes of paths are constructed. A even program can have an infinite number of paths through a program. Fortunately, there are strategies that provide a theoretical basis in selecting the best possible subset of paths. Attempting to test the paths is ineffective without a clear idea of which paths constitute a representative set. The boundary-interior method is a strategy for choosing a group of paths through a program to build a finite set of classes such that a test of one path from each class constitutes an intuitively complete set of tests. In this phase, class descriptions of a program are defined through the use of the boundary-interior method. The complete set of class descriptions for a program can be represented in the form of a description tree. Figure 6 contains the boundary-interior class description tree for the program in Figure 7. The leftmost path in the tree describes the class of all paths that test the interior of the loop in the program. The other paths are boundary tests.

Phase II. Descriptions of the sets of input data that cause the different standard classes of paths to be followed are constructed. The predicates in a program are the condition statements. The predicates in a path, as well as the input and computational statements that affect the variables in the predicates, form an implicit description of the subset of the input domain that causes that path to be followed. Figure 8 contains the implicit input data description for the interior test class description in Figure 6. The input statement READ N is substituted by N ¥ #1, where #1 symbolizes a dummy input variable.

Figure 6. Description Tree

Figure 7. Factorial Program

Figure 8 Implicit Input Data Description

Figure 9. Partially Explicit Description

Figure 10. Explicit Description

Phase III. The implicit descriptions are transformed into equivalent partially explicit descriptions. This phase uses a symbolic interpretation process that attempts to evaluate and delete the assignment statements and FOR-loops in an implicit description; therefore, this phase results in the generation of partial explicit descriptions. The generation of complete explicit descriptions is prevented by the presence of array references and loops in implicit descriptions. Figures 9 and 10 show the partially explicit description and the explicit description, respectively.

The assignment N ‰ #1 has been evaluated and deleted and the symbolic value #1 is substituted for N in the predicates N = 0 and N Z 0. The assignment N ‰ N - 1 has also been evaluated. The symbolic value #1 - 1 of N cannot be substituted for occurrences of N in the FOR-loop because N is assigned a value in the loop. The assignment N ‰ #1 - 1 must be retained to denote the value of N on entry for the FOR-loop.

The loop in Exhibit C-17 Can be replaced by the predicate (N < 0 v N > K1 - 1) and the assignment N ‰ N - K1.

Figure 11. Partially Explicit Subset Description

Figure 12. Explicit Subset Description

Phase IV. Explicit descriptions are constructed of subsets of input data sets for which the third phase was unable to construct explicit descriptions. Each of the explicit and partially explicit descriptions generated by phase III describes a set of input data. Phase IV constructs explicit descriptions of subsets of the sets that are described by partially explicit descriptions. Subset descriptions are constructed by traversing the FOR loops in partially explicit descriptions. Partially explicit subset descriptions can be constructed by choosing particular values of K1 in Figure 9. Figure 11 contains the subset description corresponding to the choice of K1 = 0. Figure 11 is evaluated to produce explicit subset descriptions (see Figure 12).

Phase V. The input values that satisfy explicit descriptions are generated. An integrated collection of inequality solution techniques is applied to the complete descriptions to generate test cases. These techniques are applied to both the explicit descriptions, which are generated by phase III of the methodology, and to the subset descriptions generated by phase IV. The explicit subset description in Figure 12 can be easily solved through the use of a method for linear systems in one variable (see Figure 13).

Figure 13. Explicit Subset Description Solution

Example. An example of this technique is given in the preceding section.

Effectiveness. This technique detects computation, path, and missing path errors. Only phases I and II have been implemented and shown to be effective. Generation of test data, however, cannot be guaranteed for classes of paths containing branch predicates that involve subroutines, functions, or nonlinear expressions in several variables.

Applicability. Path analysis is applicable during unit, module, and integration test phases to high-order languages.

Maturity. This technique is still in the early stages of its development.

User Training. A firm understanding of the underlying methodology is necessary to implement this technique.

Costs. Staff time is the main cost factor of this technique.

Domain Testing

Domain testing detects errors that occur when a specific input follows the wrong path because of an error in the control flow of the program. The strategy is based on a geometrical analysis of the input domain space and uses the fact that points on or near the border of the input space are most sensitive to domain errors.

Information Input. The program code and minimum-maximum constraints imposed on the input variables are the input needed. In most applications, the minimum maximum range of the input values is known.

Information Output. This technique detects errors in condition statements (i.e., incorrect relational operator or incorrect operand).

Outline of Method. An input space is n-dimensional, where n is the number of input variables. The input space structure is a geometrical representation of the input space. Given the input and the minimum and maximum constraints imposed on the input variables, an input space structure can be constructed. This structure gives a pictorial representation of the domains of the input variables.

Test points are generated for each border segment identified. Border segments are sections of the input space that are delimited by the predicates in the program. These test points determine if both the relational operator and the position of the border are correct.

The following process identifies those test points sensitive to domain testing errors:

· A predicate in the program is chosen that may be in error.

· The assumed correct predicate is determined.

· The outcome of the program with the original predicate is compared with the outcome of the program with the assumed correct predicate.

· The test points that result in unequal outcomes are the test points sensitive to domain errors.

The number of required test points grows linearly with both the dimensionality of the input space and the number of path predicates.

Given the input listed previously, an input structure can be drawn. This structure is a graphical perspective of the relationship between the input domain and the control paths through a program. The input domain is a set of input data points satisfying a path condition.

Example. The inclusion of an example of domain testing is beyond the scope of this appendix because of the complexity of the methodology.

Effectiveness. Domain testing is effective in detecting incorrect relational operators and incorrect operands in condition statements.

Applicability. This technique is applicable during the unit or module test phase, and the program should be written in an Algol- or Pascal-like language; the control structures should be simple and concise.

Arrays, subroutines, and functions are programming language features that are not currently implementable. Further research is needed to determine whether these features pose any fundamental limit to the domain testing strategy.

Maturity. Domain testing strategy is still in the developmental or experimental stages and for this reason is not widely used.

User Training. No specific knowledge is needed to do domain testing other than familiarity with the methodology.

Costs. The primary costs of the domain testing strategy involve time spent examining and analyzing the test cases. This process could be automated through the use of an input-output specification.

Partition Analysis

Partition analysis uses information from both the specification and implementation to test a procedure. Symbolic evaluation techniques are used to partition the set of input data into subdomains so that the elements of each sub-domain are treated uniformly by the specification and implementation. The information attained from each sub-domain is used to guide in generating test data and also verifies consistency between the specification and the implementation.

Information Input. The necessary input is the program specifications written in a formal specification language and the implementation (i.e., code).

Information Output. The output comprises a set of test data and a measure of the consistency of the implementation with the specification.

Outline of Method. Partition analysis is divided into two types of testing: partition analysis verification, which tests for consistency between the specification and the implementation, and partition analysis testing, which generates test data. An introduction to the terminology is presented to explain the methodology of this testing technique.

The domain of a procedure (or module) can be partitioned into sub domains by examining the specification and the implementation. Symbolic evaluation can be applied to the specification to create sub domains, which are called sub specification (subspec) domains. Each path through the implementation represents a sub domain called a path domain. As a result, a partition of the whole procedure can be constructed by overlaying these two sets of subdomains. The resulting subdomains are called procedure subdomains.

Partition Analysis Verification. This test provides a means to examine the consistency between the specification and the implementation. There are three properties associated with consistency-compatibility, equivalence, and isomorphism. The specification and the implementation are consistent only if these three properties are present. Compatibility is shown if the specification and the implementation demonstrate uniformity of declarations of parameters and global variables. The input and output domain of the implementation must agree with that of the specification. Equivalence is shown when the subspec domains are equal to the path domains. Isomorphism is shown if there is a one-to-one correspondence between the subspec domains and the path domains.

Partition Analysis Testing. This technique provides a method of generating test data that is sensitive to computation and domain errors. An examination of the representations of the subspec domains and path domains exposes computation errors. To detect domain errors, an examination of the representations of procedure subdomains is required.

Example. Because of its complex nature, an example of partition analysis is beyond the scope of the appendix.

Effectiveness. Partition analysis can detect missing path errors, incorrect relational operators in condition statements, domain errors, and computational errors. Initial experimentation with this technique has provided positive results; however, more experimentation is needed to demonstrate the reliability of this method.

Applicability. This technique is applicable during unit and module test phases. A formal specification language must be used.

Maturity. Partition analysis is still in the developmental and research stage. Additional testing is necessary for this technique to become widely used.

User Training. Familiarity with symbolic expression of a procedure is helpful.

Costs. Staff time is the main cost involved in using this technique.

Data-Flow Guided Testing

Data-flow guided testing is a method for obtaining structural information about programs that has found wide applicability in compiler design and optimization. Control flow information about a program is used to construct test sets for the paths to be tested. This specialized technique is not to be confused with data flow analysis, a technique that detects errors that can be represented by particular sequences of events in a program's execution (i.e., reading a file before it is opened).

Information Input. A control flow graph representation of the program is the necessary input for this technique.

Information Output. Data-flow guided testing provides information concerning code optimization in the following ways:

Available Expressions. An expression X opr Y (where opr is an operator) is available at a point p in a flow graph, if every point from the initial node to p evaluates X opr Y and if after the last such evaluation before reaching p there are no subsequent assignments to X or Y. This allows redundant computation of some expressions within each node to be eliminated.

Live Variables. Given a point p in a flow graph, the identification of which variables are live at that point (i.e., which variables given before this point are used after this point) provides useful information. An important use of this information is evident when object code is generated. After a value is computed in a register, and presumably used within a block, it is not necessary to store that value at the end of the block if it is dead. In addition, if all registers are full and another register is needed, using a register with a dead value is ideal because that value need not be stored.

Reaching Definitions. It is desirable to know what uses might be affected by the particular definition of a variable. This information can detect potentially undefined names by introducing a dummy definition of each name A preceding the initial node of the flow graph and determining whether the dummy definition of A reaches any block that has a use of A and does not define A before that use.

Very Busy Variables. An expression B opr C is very busy at point p if along every path from p there is a computation of B opr C before any definition of B or C. If B opr C is very busy at p, it can be computed at p, even though it may not be needed there, by introducing the statement T: B opr C. Then all computations A: =B opr C reachable from p by A: =T are replaced.

Outline of Method. There are two main methods for solving data flow problems: the interval approach and the iterative method. The interval approach collects relevant information by partitioning the program now graph into subgraphs called intervals and replacing each interval by a single node containing information about the data items used, defined, and preserved within that interval. The iterative approach propagates data flow information in a simple iterative manner until all the required information is collected.

Example. Examples of solutions to data-flow guided testing problems can be found in the current literature.

Effectiveness. This technique has proved to be effective in detecting data flow problems. Examples of these types of problems are available expressions, live variables, reaching definitions, and very busy variables.

Applicability. Data-flow guided testing is generally used to help optimize program code. As a test technique, this method may be used during algorithm confirmation, design verification, unit testing, and module testing.

Maturity. This highly mature technique is primarily used as a code optimization method.

User Training. Familiarity with certain concepts and constructs of graph theory must be acquired to implement and understand this technique.

Costs. The principal cost associated with this technique is staff time. Fortunately, because this technique has been partially automated, staff time can be reduced.

DYNAMIC ANALYSIS TECHNIQUES

INSTRUMENTATION-BASED TESTING

Instrumentation-based testing techniques involve inserting statements or routines that record properties of the executing program but do not affect the functional behavior of the program. This section includes technique descriptions of path and structural analysis, performance measurement techniques, assertion testing, and interactive test and debugging aids.

Path and Structural Analysis

Path and structural analysis produce certain analytic information consisting of a listing of the segments of the program undergoing analysis and the number of times each segment executes when the program is executed. A segment is a portion of the program that may consist of statements, branches, complete execution paths, or paragraphs as in COBOL.

The goal of path and structural analysis is to increase the amount of code tested. It is usually impossible to test all the paths or combinations of branches in a large program. It is possible, however, to test all branches. For effective testing, all branches and as many paths as possible should be tested.

Instrumentation tools are used to determine how much coverage is achieved in a test run. Although these tools can also provide timing data, execution traces, and other information, the tester must formulate input data and decide whether the program has run correctly for each test.

Information Input. The input to this technique is the source program and an initial set of program test data. In addition, sophisticated coverage analyzers may require input parameters or commands that indicate which of several alternative coverage measures are to be used.

Information Output. At a minimum, the output is a listing of the code being analyzed and the number of times a segment executes when the program is executed.

In addition to coverage information, this analysis may also record and print variable range and subroutine call information. The minimum and maximum values assumed by each variable in a program, the minimum and maximum number of times that loops are iterated during the executions of a loop, and a record of each subroutine call may be reported.

Outline of Method. Typically, this analysis consists of two parts--a preprocessor that inserts code that collects the coverage information during execution (i.e., instrumenting the code) and a postprocessor that reduces data resulting from execution of the instrumented code and prepares reports and test results.

The code that collects coverage data can collect other information as well. The nature of this information depends on the tool used and the level at which the code is instrumented. If probes are inserted after every statement in the program, the entire history of the execution of a program can be recorded. Of course, instrumenting at the statement level incurs significant computer overhead. To determine branch coverage, it is necessary to insert probes only at every decision statement.

Structural analysis involves execution of:

· The preprocessor (input equals the source program to be analyzed), to produce instrumented source code. This step results in the insertion of probes in the program for test coverage analysis requirements. The probes call special data collection routines or update matrices that record the execution profile of the program.

· The compiler (input equals the instrumented source code), to produce object code, and the linkage loader, to produce the executable program complete with data collection routines.

· The program, including execution of probes and data collection routines.

· The postprocessor, to analyze the collected data and print the test results.

Three levels of structural analysis are possible: statement, branch, and path. Statement analysis, which is least rigorous, may be fulfilled 100% without executing all possible program branches. Path analysis, however, requires executing all possible paths in the program, which in some cases may not be achievable. Branch analysis is usually preferred.

There is no automated mechanism for error detection in structural testing. The user must recognize errors by looking at the program output. An obvious way to provide automatic assistance in error detection is to use executable assertions. Assertions are better than the execution traces produced by instrumentation tools for two reasons. First, assertions express relationships between variables rather than just reporting their values; in this way, error conditions can be checked automatically. Second, execution traces tend to produce large amounts of output, which wastes computer resources and annoys the user.

When errors are found in a program during the course of testing and code changes are made to correct them, the new version of the program must be run through the instrumentation tool again.

Example. An example of test coverage analysis follows.

Application. A quicksort program was constructed that contains a branch to a separate part of the program code that carries out an insertion sort. The quicksort part of the code branches to the insertion sort whenever the size of the original list to be sorted or a section of the original list is below some threshold value. Insertion sorts are more effective than quicksorts for small lists and sections of lists because of the smaller constants in their execution time formulas.

Error. The correct threshold value is 11. As a result of a typographical error, the branch to the insertion sort is made whenever the length of the original list or the section of the list currently being processed is less than or equal to one.

Error Discovery. Parts of the insertion sort code are not executed unless the list or list section being sorted is of length greater than one. Examination of the output reveals that parts of the program are never executed, regardless of the program tests used. This alerts the programmer to the presence of the error. This error is not discoverable by the examination of test output data alone because the program still correctly sorts lists.

Effectiveness. Structural analysis and use of an instrumentation tool can provide the tester with the following information:

· It can reveal untested parts of a program, so that new test efforts can be concentrated there.

· Data on the frequency of execution of parts of a program, as well as the time required to execute them, can be tabulated--This information can be used to make the program more efficient through optimization techniques.

· The range of values (high, low, average, first, last) assumed by a variable can be recorded and checked for reasonableness.

· A trace of what has occurred at each statement in a section of code can be printed-This can be useful during debugging.

· The data flow patterns of variables can be analyzed from the execution trace file-In this way errors and anomalies in the use of subscripted variables can be detected.

· The degree to which the test cases exercise the structure of the program can be calculated. (Traditional testing methods typically exercise 30% to 80% of a program.)

Structural analysis is effective in detecting computation, logic, data-handling, and data output errors. This technique can help in the detection of omitted program logic and can assist in the detection of errors that occur only if a certain combination of segments is executed.

The percentage of errors detected in various programs by structural analysis ranges from 33% to 92%. The combination of structural and functional testing has proven to be effective.

This technique has been demonstrated as effective on both host and target systems.

Applicability. This analysis introduces a concept called structure-driven testing. Traditionally, testing has been requirements driven; test cases are developed largely to demonstrate that a program satisfies the functional requirements imposed on it. Functional or requirement-driven testing should not be superseded by the use of this technique but should be used in conjunction with it.

Structural analysis can be used with programs of any type of application. The timing information provided by instrumentation tools is useful in improving the efficiency of time-critical routines.

Structural analysis of large and complex programs is difficult, but these programs most need thorough test coverage. This testing technique has been shown to be effective early in the systems development life cycle, specifically during unit and module testing phases.

Maturity. Path and structural analysis is a highly mature approach to testing a program. Many instrumentation tools have been developed during the last 10 years and are available for most programming languages and computers.

User Training. There are no special training requirements for the use of this technique. The use of pre- and postprocessors is similar to the use of a compiler.

Costs. The most expensive part of structural analysis is the staff time needed to develop test data to achieve a required coverage level and examine output for errors. More experience with structural analysis is needed before reliable estimates of analysis time and cost can be developed. Available data suggests that using structural analysis to debug programs requires 0.5 to 2.0 worker-days per error found.

Considerable data is available on the computer overhead of instrumentation tools. The amount of overhead depends on several factors, including the level of instrumentation and the options selected. Generally, instrumentation tools require:

· A 20% to 100% increase in program size.

· A 2% to 50% increase in execution time.

There are numerous instrumentation tools available from government and commercial sources; most sell for less than $10,000.

It is difficult to estimate the total costs of structural analysis, because no one knows how to estimate the analysis time or number of test runs required. A user of structural analysis on a large software project claimed a significant cost saving over traditional testing methods.

Performance Measurement: Execution Time and Resource Analysis

There are two types of software performance measurement techniques: execution time and resource analysis, and algorithm complexity analysis. These two techniques differ in ease of use and maturity. The former technique is a more practical approach to the measurement of resource use, whereas the latter technique is a more rigorous approach.

Execution time and resource analysis involves monitoring the execution of a program to locate and identify possible areas of inefficiency in the program. Execution data is obtained through a random sampling technique used while the program executes in its usual environment or by the insertion of probes into the program. The probes consist of calls to a monitor that records execution information (e.g., CPU and I/O time). At the end of execution, reports are generated that summarize the resource use of the program.

Information Input. This technique requires the program source code and any data necessary for the program to execute.

Information Output. The output produced by this technique consists of reports that show by statement, groups of statements, or module the execution time distribution characteristics and resource uses. This includes information showing per module the number of entries to the module, cumulative execution time, mean execution time per entry, and the percent execution time of the module with respect to the total program execution time.

Other types of reports include:

· A summary of all the sample counts made during data extraction (e.g., the number of samples taken where the program was executing instructions, waiting for the completion of an I/O event, or otherwise blocked from execution).

· A summary of the activity of each load module.

· An instruction location graph that gives the percentage of time spent for each group of instructions partitioned in main storage.

· A program time line that traces the path of control through time.

· A control passing summary that gives the number of times control is passed from one module to another.

· A wait profile showing the number of waits encountered for each group of instructions.

· A paging activity that displays pages-in and pages-out for each group of instructions.

· A performance profile showing the amount of CPU, execution time, and primary and secondary storage used.

Outline of Method. Execution time and resource analysis typically consist of two processing units. The first unit runs the program being monitored and collects data concerning the execution characteristics of the program. The second unit reads the collected data and generates reports from it. Two variations in implementing the technique are described in the following sections.

Method 1. This method involves monitoring a program by determining its status at periodic intervals. The period between samples is usually controlled through an elapsed interval timing facility of the operating system. Samples are taken from the entire address range addressable by the executing task. Each sample may contain an indication of the status of the program, the load module in which the activity was detected, and the absolute location of the instruction being executed. Small sample intervals increase sampling accuracy but result in a corresponding increase in the overhead required by the CPU.

Main storage utilization of a program can be defined as how well the program uses the storage that has been allocated to it. In general, poor use of primary storage occurs when the program requests instructions or data that are not currently resident; this is called a page fault because a new page of information must be requested from secondary storage. Main storage utilization information is obtained by accumulating ordered pairs of page fault counts and program counters. Because the program counter maps into a machine-level representation of the program code, only a crude measure of statement locality in the high-level code can be achieved. This information can be displayed with a histogram; each histogram bar is generated beside a grouping of program statements, showing where poor storage use is occurring.

The statistics gathered by the data extraction unit are collected and summarized in reports generated by the data analysis unit. References to program locations in these reports are in terms of absolute addresses. To relate the absolute locations to source statements in the program, however, the reports also provide a means to locate in a compiler listing the source statement that corresponds to that instruction. Sources of waits and program locations that use significant amounts of CPU time can thus be identified directly in the source code; any performance improvements to the program occur at these identified statements.

Method 2. This method involves the insertion of probes (i.e., program statements) into the program at various locations of interest. There are two ways of determining such information as the CPU time necessary to execute a sequence of statements. In the first, a probe execution results in a call to a data collection routine that records the CPU clock time at that instant; a second probe execution results in a second call to the data collection routine. A subtraction of the first CPU time from the second yields the net CPU time used. In the second method, the probes are used to record the execution of program statements. Each statement is associated with a machine-dependent estimate of the time required for its execution. The execution time estimate is multiplied by the statement execution count to give an estimate of the total time spent executing the statements. This is done for all statements in a program. Reports can be produced showing execution time breakdowns by statement, module, and statement type.

Example. Three examples of this technique follow.

Application. A program that solves a set of simultaneous equations is constructed. The program first generates a set of coefficients and a right-hand side for the problem being solved. It then solves the problem and delivers the solution.

Error. In the set of calculations required to solve the problem, a row of coefficients is divided by a constant and then subtracted from another row of coefficients. The divisions are performed within a nested DO-loop but should be moved outside the innermost loop, because the dividend and divisors within the loop do not change.

Error Discovery. The performance of the program is evaluated through the use of a software monitor. Examination of the output reveals that the program spends almost 85% of its time in a particular address range. Further analysis shows that 16.65% of all CPU time is used by a single instruction. A compiler listing of the program is used to locate the source statement that generated this instruction, which is found to be the statement containing the division instruction. Once the location of the inefficiency is discovered, it is left to the programmer to determine whether and how the code can be optimized.

Application. A particular module in a real-time, embedded computer system is required to perform its function within a specific time period. If not, a critical time dependent activity cannot be performed, resulting in the loss of the entire system.

Error. The module in question contained an error that involved performing unnecessary comparisons during a table lookup function, although the proper table entry was always found.

Error Discovery. The problem was discovered during system testing through the use of an execution time analyzer that clearly indicated that the offending module was not able to meet its performance requirements. The specific error was discovered on further examination of the module.

The following example demonstrates the use of resource utilization information.

Application. The current work assignment requires that code be written in FORTRAN. The application requires the use of a two-dimensional array for data representation.

Error. A programmer whose most familiar language is C proceeds to program this application as it would be written in C. This particular FORTRAN implementation, however, stores data in matrices by columns, whereas C implementations store data by rows.

Error Discovery. A histogram of the main storage utilization reveals that a huge amount of page faulting occurs in a loop where the two-dimensional array is manipulated. The programmer reviews the code and modifies the array indices accordingly, so that the left index varies fastest. As a result, columns are examined in groups, not rows.

Effectiveness. Execution time and resource analysis is a valuable technique in identifying performance problems in a program. The majority of the execution time comes from executing a very small percentage of the code. Knowledge of the location of execution-time critical code is helpful in optimizing a program to satisfy performance requirements and reduce costs.

Applicability. The primary value of the technique is its use as a performance requirements validation tool to formally validate performance requirements, which must be clearly stated and associated with specific functional requirements. Moreover, the system should be designed so that the functional requirements can be traced to specific system modules. This technique can be applied to any kind of program in any programming language and is applicable during the unit test through the verification test phases.

Maturity. Execution time and resource analysis is a widely used, mature technique.

User Training. There are no special learning requirements for the use of this technique. If it is to be used effectively, however, the input parameters must be carefully selected in determining the most relevant reports to be generated. Once the areas of a program that are most inefficient have been identified, skill is required to modify the program to improve its performance.

Costs. The largest cost of this technique is that incurred by the CPU to extract the data during execution. In one implementation, extraction of data resulted in an increase of user program CPU time by 1% to 50%. Storage requirements also increase to provide main storage for diagnostic tables and the necessary program modules of the tool.

Performance Measurement: Algorithm Complexity Analysis

Two phases of algorithm complexity analysis can be distinguished: a priori analysis and a posteriori testing. In a priori analysis, a function (of relevant parameters) is devised that bounds the algorithm's use of time and space to compute an acceptable solution. The analysis assumes a model of computation (e.g., a Turing, random-access, or general purpose machine). Two kinds of problems are usually treated: analysis of a particular algorithm and analysis of a class of algorithms. In a posteriori testing, actual statistics are collected about the algorithm's consumption of time and space while it is executing.

Information Input. The specification of the algorithm and the source code representing the algorithm are necessary input.

Information Output. The output of a priori analysis consists of:

· Confidence of algorithm's validity.

· Upper and lower computational bounds.

· Prediction of space use.

· Assessment of optimality.

The output of a posteriori testing is a performance profile.

Outline of Method. The methodology of a priori analysis and a posteriori testing is outlined in the following sections.

A Priori Analysis. Algorithms are analyzed to improve them, if possible, and to choose among available algorithms. The following criteria may be used:

· Correctness.

· Amount of work done.

· Amount of space used.

· Simplicity.

· Optimality.

Correctness. The following steps are involved in establishing the correctness of an algorithm:

· Understanding that an algorithm is correct if, when given a valid input, it computes for a finite amount of time and produces the right answer.

· Verifying that the mathematical properties of the method or formulas used by the algorithm are correct.

· Verifying by mathematical argument that the instructions of the algorithm do produce the correct answer and do terminate.

Amount of Work Done. A priori analysis ignores all of the factors that are machine- or language-dependent and concentrates on determining the order of magnitude of the frequency of statement execution. For denoting the upper bound on an algorithm, the O-notation is used, whose definition is:

F(n) = O(g(n)) if and only if there exist two positive constants C and n_{_}

such that f(n) < Cg(n) for all n > n_{_}

The most common computing times for algorithms are:

O(1) < O(logn) < O(n) < O(nlogn) < O(n²) < O (n³) and O(2ⁿ)

O(1) means that the number of executions of basic operations is fixed and therefore the total time is bounded by a constant. The first six orders of magnitude are bounded by a polynomial, but there is no integer such that n^m bounds 2ⁿ. An algorithm whose computing time has this property is said to require exponential time. There are notations for lower bounds and asymptotic bounds. Complexity is the formal term for the amount of work done, measured by complexity (or cost) measure.

In general, the amount of work done by an algorithm depends on input size. In some cases, the number of operations may depend on the particular input. Table 5 shows some examples of size.

To handle the situation of the input affecting algorithm performance, two approaches are used: average and worst-case analysis. The average approach assumes a distribution of input, calculates the number of operations performed for each type of input in the distribution, and computes a weighted average. The worst-case approach calculates the maximum number of basic operations performed on any input of a fixed size.

Amount of Space Used. The number of main storage cells used by a program, like the number of seconds required for its execution, depends on the particular implementation. Some conclusions about space use, however, can be made by examining an algorithm. A program requires storage space for the instructions, the constants and variables used by the program, and the input data. It may also use some work space for manipulating the data and storing information needed to carry out its computations. The input data itself may be represented in several forms, some of which require more space than others. If the input data has one natural form (e.g., an array of numbers or a matrix), the extra space used is analyzed aside from the program and the input. If the amount of extra space is constant with respect to the input size, the algorithm is said to work "in place."

Simplicity. Often, the simplest and most straightforward way of solving a problem is not the most efficient. Simplicity in an algorithm is nevertheless desirable. It may make verifying the correctness of the algorithm easier, and it makes writing, debugging, and modifying a program for the algorithm easier. The programmer should consider the time needed to produce a debugged program when choosing an algorithm; if the program is to be used frequently, on the other hand, its efficiency is probably the determining factor.

Optimality. The following tasks must be carried out to determine how much work is necessary to solve a problem:

· Devising an apparently efficient algorithm A and analyzing A to find a function g such that for input of size n,, A does at most g(n) basic operations.

· For a function f, proving that for any algorithm in the class under consideration there is some input of size n for which the algorithm must perform at least f(n) basic operations. If the functions g and f are equal, then the algorithm A is optimal.

A Posteriori Testing. Once an algorithm has been analyzed, the next step is usually to confirm the analysis. The confirmation process consists first of devising a program for the algorithm on a particular computer. After the program is operational, the next step is producing a performance profile--that is, determining the precise amounts of time and storage the program consumes. To determine time consumption, the computer clock is used. Several data sets of varying size are executed and a performance profile is developed and compared with the predicted curve.

Examples of Size of Input

Problem Size of Input

1. Find X in a list of names Number of names in the list

2. Multiply two matrices Dimensions of the matrices

3. Solve a system of linear equations Number of equations and solution vectors

Table 5.

A second way to use the computer's timing capability is to take two programs for performing the same task whose orders of magnitude are identical and compare them as they process data. The resulting times show which, if any, program is faster. Changes to one program that do not alter the order of magnitude but that purport to speed up the program can also be tested in this way.

Example. QUICKSORT is a recursive sorting algorithm. It rearranges the keys and splits the file into two subsections, or subfiles, such that all keys in the first section are smaller than all keys in the second section. Then QUICKSORT sorts the two subfiles recursively (i.e., by the same method), with the result that the entire file is sorted.

A is the array of keys and m and n are the indices of the first and last entries, respectively, in the subfile that QUICKSORT is currently sorting. Initially m = 1 and n = k. The PARTITION algorithm chooses a key K from the subfile and rearranges the entries, finding an integer j such that for m < i < j, A(i) < K; A(j) = K; and for j < i < n, A(i) < K. K is then in its correct position and is ignored in the subsequent sorting.

QUICKSORT can be described by the following recursive algorithm:

QUICKSORT (A,m,n)

If m < n then do PARTITION (A,m,n,i,j)

QUICKSORT (A.m.j)

QUICKSORT (A,i,n)

end

The PARTITION routine may choose as K any key in the file between A(m) and A(n); for simplicity, let K = A(m). An efficient partitioning algorithm uses two pointers, i and j, initialized to m and n + 1, respectively, and begins by copying K elsewhere so that the position A(i) is available for some other entry. The location A(i) is filled by decrementing j until A(j) < K, and then copying A(j) into A(i). Now A(j) is filled by incrementing i until A(i) > K, and then copying A(i) into A(j). This procedure continues until the values of i and j meet; then K is put in the last place. PARTITION compares each key except the original in A(m) with K, so it does n/m comparisons.

Worst-Case Analysis. If, when PARTITION is executed, A(m) is the largest key in the current subfile—that is, A(m) > A(i) for m < i < n—then PARTITION moves it to the bottom to position A(n) and partition the file into one section with n - m entries (all but the bottom one) and one section with no entries. All that has been accomplished is moving the maximum entry to the bottom. Similarly, if the smallest entry in the file is in position A(m), PARTITION simply separates it from the rest of the list, leaving n - m items still to be sorted. Therefore, if the input is arranged so that each time PARTITION is executed, A(m) is the largest (or the smallest) entry in the section being sorted and p = n - m + 1, the number of keys in the unsorted section, the number of comparisons performed is:

Average Behavior Analysis. If a sorting algorithm removes at most one inversion from the permutation of the keys after each comparison, it must do at least (n² - n)/4 comparisons on the average. QUICKSORT, however, does not have this restriction. The PARTITION algorithm can move keys across a large section of the entire file, eliminating as many as n - 2 inversions at one time. QUICKSORT deserves its name because of its average behavior. For example, assume that each time PARTITION is executed, it splits the file into two roughly equal subfiles. To simplify the computation, n = 2^p - 1 for some p. The number of comparisons done by QUICKSORT on a file with n entries under these assumptions is described by the recurrence relation:

R(p) = 2^p - 2 + 2R(p - 1)

R(1)= 0

The first two terms in R(p), 2p - 2, are n - 1, the number of comparisons done by PARTITION the first time. The second term is the number of comparisons done by QUICKSORT to sort the two subfiles, each of which has (n - 1)/2, or 2p-' - 1, entries. Expand the recurrence relation to get

R(p) = 2^p - 2 + 2R(p - 1) = 2p - 2 + 2(2P-' - 2) + 4R(p - 2)

= 2^P - 2 + 2^P - 4 + 2^P - 8 + 8R(p - 3)

therefore

p-1

i=1

R(p) = S (2^p – 2ⁱ) = (p – 1)2^p - S 2^I

= (p – 1)2p – 2(2p – 2) = logn(n + 1) – n + 1

Therefore, if A(m) were close to the median each time the file is split, the number of comparisons done by QUICKSORT would be of the order (nlogn). If all permutations of the input data are assumed equally likely, then QUICKSORT does approximately 2nlogn comparisons.

Space Use. At first glance it may seem that QUICKSORT is an in-place sort, but it is not. While the algorithm is working on one subfile, the beginning and ending indexes (i.e., the borders) of all the other subfiles yet to be sorted must be saved on a stack, and the size of the stack depends on the number of sublists into which the file is split. This, of course, depends on n. In the worst case, PARTITION may split off one entry at a time in such a way that n pairs of borders are stored on the stack. Therefore, the amount of space used by the stack is proportional to n (see Table 6).

Effectiveness. Algorithm analysis has become an important part of computer science. The only issue that limits its effectiveness is that a particular analysis depends on a particular model of computation. If the assumptions of the model are inappropriate, the analysis suffers.

This technique examines algorithms with the intention of improving them and also provides a means to examine the nature of algorithms (e.g., correctness, amount of work done, amount of space used, simplicity, and optimality).

Applicability. This technique is applicable during the algorithm confirmation phase through unit testing and can be limited by the size of the application because the analysis can be lengthy. This technique is most applicable to numerical applications because many of these types of algorithms have been analyzed.

Maturity. Algorithm analysis requires experienced personnel with specialized knowledge to Implement this technique. It is not widely used.

User Training. Algorithm analysis requires significant training in mathematics and computer science. Generally, it is done by a specialist.

A Comparison of Stack Sizes

n 1000 2000 3000 4000 5000

MERGESORT 500 1050 1650 2250 2900

QUICKSORT 400 850 1300 1800 2300

Table 6

Costs. The cost to analyze an algorithm depends on the complexity of the algorithm and the amount of understanding about algorithms of the same class.

Executable Assertion Testing

Executable assertion testing is a three-step process that generates the assertions, translates the assertions into processable program statements, and processes the assertions. Assertion generation is a method of capturing the intended functional properties of a program in a special notation, called the assertion language, for insertion into the various levels of the program specification, including the program source code. Once these assertions are identified, they are translated into compilable statements. Assertion processing is the process of checking the assertions of a program during execution. This technique serves as a bridge between the more formal program correctness proof approaches and the more common black-box testing approaches.

Executable assertions are special statements inserted into the source code of a program. They allow the programmer to specify required conditions for correct operation of the program. If such a condition does not hold during execution of the program, this fact is reported by an error message. The programmer can also specify actions to be taken when an assertion is violated.

Most compilers do not recognize and translate assertions; an assertion preprocessing tool must be used. The tool generates code (in the same language as the rest of the program) that carries out the condition checking and error-handling logic for the assertion. Different preprocessing tools recognize different forms of assertions. A programmer can augment a less powerful tool by writing code to do some of the condition checking.

Information Input. A specification of the desired functional properties of the program is the input required for assertion generation. For individual modules, this breaks down at a minimum to a specification of the conditions that are assumed true on a module entry and a specification of the conditions desired on the module exit. If the specifications from which the assertions are to be derived include algorithmic detail, they indicate conditions that are to hold at intermediate points within the module as well. Generally, assertions are specified in the form of comments in the source program. A program containing user-specified assertions is then processed by the dynamic assertion processor.

Information Output. The assertions created from the functional or algorithmic specifications are expressed in a notation called the assertion language, which commonly includes higher-level expressive constructs than are found in the programming language. An example of such a construct is a set. Most commonly, the assertion language is equivalent in expressive power to first-order predicate calculus. Therefore, expressions such as "there exists x such that f(x) = 0" are possible. The generated assertions expressing the functional properties of the program can then be used as input to a dynamic assertion processor, a formal verification tool, wall<through, specification simulators, and inspections, among other testing techniques. The output of assertion processing consists of a list of the assertion checks performed and a list of exception conditions with trace information for determining the nature of the violation.

Outline of Method. Executable assertions are constructs added to a programming language that indicate by an output message that something has gone wrong in a program and permit the programmer to specify action that should be taken when such an error occurs. The general form of an executable assertion is:

ASSERT condition;

FAIL block;

The condition is an expression that can be evaluated logically (as TRUE or FALSE) during execution of the program. The fail block is optional; it contains the error-handling code.

Assertions must be translated into executable code. This is usually done by a preprocessing tool, although some compilers accept and translate assertions. The kinds of conditions that can be checked by assertions and the syntax for declaring these conditions vary from tool to tool. The types of assertions accepted by a tool are often referred to as its assertion language.

The general form of a translated assertion is:

IF (NOT condition) THEN

Print error message;

Execute fail block

END IF

The programmer must decide which assertion checks to make, encode them in the assertion language, and insert them in the code. Assertions should be programmed at the same time as the code itself for the following reasons:

· Writing the assertions increases the programmer’s understanding of the purpose and design of the program.

· The assertions themselves have mistakes that must be debugged.

· Assertions are useful throughout the life of the program but may be turned off when the code is introduced for operational use so that run-time efficiency is not affected.

· Adding a full set of assertions to a large, already coded program is a tedious job that no one will want to do.

The processing of the assertion violation keeps track of the total number of violations for each assertion, prints a message indicating that a violation of the assertion has occurred, and prints the values of the variables referenced in the assertion. In addition, the number of times the assertion is checked may be kept and printed when a violation occurs. The information collected about violation detection should be of sufficient quantity and quality to explain the specific error to the programmer. Specifying assertions within comments is a valuable form of documentation and also ensures that the source program is kept free of non-portable, tool-specific directives.

Introducing assertions must not alter the functional behavior of a program. Execution time, however, is increased, the amount of which depends on the number of assertions that are processed.

The reliability of assertions depends on the programmer, who must understand the way the program is supposed to operate, be familiar with the assertion language, and be thorough in the use of assertions. Assertions must be debugged just like the rest of a program. Test data should be generated that causes the execution of each assertion.

The test data used has a great effect on the reliability of executable-assertion testing. Test data must cause assertions to be violated, or errors go undetected. To be effective, assertion testing should be combined with a systematic method (e.g., structural or functional testing) of generating test data.

Example. Because executable assertion testing is so closely entwined with program development, only brief examples are given. An example of assertion generation and translation is followed by an example of assertion processing.

During program development, the requirement arises for sorting the elements of an array, or table. To support flexible processing in the rest of the system, the array is declared with a large, fixed length. Only a portion of the array, however, has elements in it. The number of elements currently in the array, when passed to the sort routine, is contained in the first element of the array. The array must always be sorted in ascending order. The sorted array is returned to the calling program through the same formal parameter.

The first specification of the sort routine may appear as follows:

SUBROUTINE SORT (A.DIM)

C A is the array to be sorted

C DIM is the dimension of A

C sort array

RETURN

END

The characteristics of the subroutine may be partially captured by the following assertions:

ASSERT INPUT (0 < A(1) < DIM), (DIM > 2)

ASSERT OUTPUT (A(1) = 0 v A(1) = 1 ^ TRUE v

(A(1) > 1 ^ FORALL 1 IN [2..A(1)] A(1) < A(1 +1))

The input assertion notes the required characteristics of A(1) and DIM. The output assertion indicates that if there are 0 or 1 elements in the array, the array is sorted by default. If there are at least 2 elements in the array, the array is in ascending order.

The next cut at the program may have the following appearance. The following is an intermediate assertion:

SUBROUTINE SORT (A, DIM)

C A is the array to be sorted

C DIM is the dimension of A

ASSERT INPUT (0 < A(1) < DIM), (DIM > 2)

IF (A(1).LE.1) GOTO 100

ASSERT (2 < A(1) < DIM)

C sort nontrivial array

100 ASSERT OUTPUT (A(1) = 0 V a(1) = 1 ^ TRUE)

(A(I) > 1 ^ FORALL I IN [2..A(1)] A(I) < A(I + 1))

RETURN

END

A straight selection sort algorithm is chosen for the nontrivial case; that is, find the smallest element and place it in A(2), find the next smallest and place it in A(3), and so forth, where the original contents of A(2) are exchanged with the element that belongs there in the sorted array. An appropriate intermediate assertion is included within the sorting loop:

C PERFORM STRAIGHT SELECTION SORT

DO 50 J = 2, A (1)

C find smallest element in A(J)..A(A(1) + 2)

C let the element be A(K)

C exchange A(J) and A(K)

ASSERT (2 < J < A(1))

(FORALL I IN [2,,A(I)] A(I) < A(I + 1))

50 CONTINUE

A significant issue that has not been addressed yet is asserting on termination that the sorted array is a permutation of the original array (i.e., to assert that in the process of sorting no elements were lost). To do this at the highest level, the first cut at the program requires advanced assertion language facilities.

The following is an example of assertion processing. The program segment in Figure 14 is taken from a Pascal program that calls on routine SORT to sort array A, consisting of N integer elements, in ascending order. The assertion following the sort command asserts that the elements are indeed in ascending order on return from the sort procedure.

The program segment in Figure 15 is that which results after all the assertions have been translated into Pascal. A large number of statements were used to implement the assertion because of the involved checking required to implement an "assert for all. . ." Simpler assertions require fewer statements. The specification could be reduced through the use of a common assertion violation procedure.

Table 7 shows values of A used in successive executions of routine sort. The resulting execution produced the following assertion violation:

violation of assertion 3 at statement 57 on execution: 3

array A = 3 12 27 53 171 201 251 390 501 O

This was the only violation that occurred. Subsequent analysis of the sort procedure indicated that the error was the result of an "add-by-one" error on a loop limit.

Values of A Used in Routine Sort

Execution Array A

1 0 3 12 27 53 171 201 251 390 501

2 0 12 3 53 27 201 171 290 251 501

3 501 390 251 201 171 53 27 12 3 0

4 0 0 0 0 0 0 0 0 0 0

5 0 0 0 100 100 100 999 999 999 1000

Table 7

Effectiveness. Executable assertion testing, particularly when used in conjunction with an allied technique like functional testing or structural testing, can be extremely effective in aiding the testing of a program. Such effectiveness is possible, however, only when the assertions are used to capture the important functional properties of the program. Such assertions as the following are of no use at all:

I=0

1=1+1

ASSERT I>0

Capturing the important properties can be a difficult process and prone to error. Such effort is well rewarded, however, by increased understanding of the problem to be solved. Indeed, assertion generation is effective because the assertions are to be redundant with the program specifications. This redundancy enables the detection of errors. A cost-effective procedure, therefore, is to develop intermediate assertions only for particularly important parts of the computation. Input and output assertions should always be employed whenever possible.

12 Var

13 N:integer;

A:array [1..MAXN] of integer;

26 begin

56 sort (N,A);

57 (*assert forall I in [1..N-1]: A[i] < = A [I + 1] *);

Figure 14. Source Program with Untranslated Assertions

12 Var

13 N: integer;

14 A: array [1..MAXN

15 AssertVioCount: array [1..NumofAsserts] of integer;

16 AssertXqtCount: array [1..NumofAsserts] of integer;

17 assert: boolean;

29 begin

77 sort(N,A)

78 (*assert forall:in [1..N-1]:A{I} < = A[I + 1]*);

79 AssertXqtCount[3] : = AssertXqtCount[3] + 1;

80 assert: = true;

81 I: = 1;

82 while(I < = N) and (assert) do (*check assertion*)

83 if A [i] > A [i + 1] then

84 assert: = false

85 else

86 I: = I + 1;

87 if not assert then begin (*assertion violation*)

88 AssertVioCount [3]: = AssertVioCount [3]:1;

89 Writein (‘violation of assertion 3 at statement57’);

90 Writein (‘on execution:’, AssertXqtcount [3]);

91 Writein (‘array a =’, A)

92 end (*assertion violation*);

Figure 15. Translated Assertions

The effectiveness of executable assertion testing depends on the quality of the assertions included in the program being analyzed. Moreover, if the translation is being done without the use of a dynamic assertion processor, the amount of time required to translate, coupled with the unreliability associated with the process, reduces its effectiveness. Nevertheless, the technique can be of significant value in revealing the presence of program errors.

Assertions can be used to detect a wide variety of errors. They are most effective against computation errors, but have also been effective in catching logic, data input, data-handling, interface, data definition, and data base errors.

Executable assertions can detect any error that can be expressed as a condition in the assertion language. Some important examples of such errors are:

· The result of a computation is outside a range of reasonable values or is inconsistent with another result.

· A variable does not behave as intended; it changes value when it should not, or it does not change in the desired way.

· Control flow is incorrect; the branch taken is incompatible with program conditions, or a special case is not handled properly.

· A call to a routine results in an unacceptable condition on return.

· The output of a routine is incorrect.

In addition to detecting errors, executable assertions can do the following:

· Indicate that a program is operating incorrectly.

· Help the programmer to locate errors.

· Indicate that a program is being used improperly.

· Provide fault-tolerance in a program.

· Express specifications and design intentions as in-line documentation of the program.

· Form the basis for a formal verification of a program.

Applicability. This technique is generally applicable during unit, module, and integration test phases. If run-time efficiency is not a problem, the assertions may be left active in the operational program.

Maturity. Tools that process executable assertions exist for most common high-order languages (e.g., COBOL, FORTRAN, Jovial, PW1, Pascal), because these languages do not include assertions as a statement type. Executable assertion testing is a fairly mature test technique.

User Training. To write assertions, a programmer needs to understand how the program is designed and coded and needs to develop skill in handling assertion constructs; this comes with a little experience. A programmer must then:

· Find out enough about the application area of the program to develop tight bounds on the values of variables and the results of computations.

· Write assertions that can trap special error conditions (e.g., logic and data flow errors). This can be difficult when using an assertion language of limited power.

· Be thorough--All conditions that can be checked for assertions must be identified.

Costs. The costs of assertion testing depend on the number of assertions placed in the code, the difficulty of writing and debugging them, and the number of test runs made with the asserted program. Little data or experience can be used to gauge the magnitude of these costs. Writing and debugging assertions can be expected to add significantly to costs at the beginning of a project, but the overhead of making test runs should not be as great.

The computer resources used in assertion testing depend on how thoroughly assertions are used. The categories of computer overhead are:

· The time required by the preprocessor and compiler to turn the assertion into executable code.

· The extra execution time required by the assertion checks.

· The extra space required by the source and object versions of the program because of the assertions.

The preprocessor and compiler overhead is incurred each time either the assertions or the code is changed. The execution time overhead is incurred once for each test run. Once a production version of the program is achieved, the assertions can be removed by recompiling the program with the assertions disabled; that is, by compiling the source program with the assertions in the form of comments. Execution time overhead is thus avoided by the end user. The best available estimate of the cost in analyst time to develop the assertions is the cost to write an equivalent amount of code.

Interactive Test and Debugging Aids

Interactive test and debugging aids are tools used to control or analyze the dynamics of a program during execution. The capabilities provided by these tools are used to help identify and isolate program errors. These capabilities allow the user to:

· Suspend program execution at any point to examine program status.

· Interactively dump the values of selected variables and main storage locations.

· Modify the computation state of an executing program.

· Trace the control flow of an executing program.

Another common name for this technique is symbolic debugger.

Information Input. Interactive test and debugging aids require as input the source code to be executed and the commands that indicate which testing operations are to be performed by the tool during execution. Included in the commands are indications of which program statements are to be affected by the tool's operation. Commands can be inserted in the source code or entered interactively by the user during program execution at preselected break points.

Information Output. The information output by an interactive test and debugging aid is a display of requested information during the execution of a program. This information may include the contents of selected storage cells at specific execution points or a display of control flow during execution.

Outline of Method. The functions performed by an interactive test and debugging aid are determined by the commands input to it. Some common commands are:

· BREAK-Suspend program execution when a particular statement is executed or a particular variable is altered.

· DUMP-Display the contents of specific storage cells (e.g., variables, internal registers, other storage locations).

· TRACE-Display control flow during program execution through printed traces of:

· Statement executions (using statement labels or line numbers).

· Subroutine calls.

· Alterations of a specified variable.

· SET-Set the value of a specified variable.

· CONTENTS-Display the contents of certain variables at the execution of a specific statement.

· SAVE-Save the present state of execution.

· RESTORE-Restore execution to a previously SAVED state.

· CALL-Invoke a subroutine.

· EXECUTE-Resume program execution at a BREAK point.

· EXIT-Terminate processing.

These commands grant complete control over the computation state of an executing program and enable the tester to inspect or change the value of any variables at any point during execution.

The capabilities of special interactive test and debugging aids can also be found in many implementations of compilers for such languages as FORTRAN, COBOL, and PL/I, which contain testing features added to the language.

Example. A critical section of code within a routine is to be tested. The code computes the values of three variables, X, Y, and Z, which later serve as input to other routines. To ensure that the values assigned to X, Y, and Z have been correctly computed in this section of code, an interactive test and debugging aid is used to test the code.

The code is initially inserted with two commands. A BREAK command is inserted immediately before the first statement and immediately after the last statement of the section of code being tested. Preceding the second BREAK command, a CONTENTS command is also inserted to cause the contents of X, Y, and Z to be displayed after their appropriate values have been assigned.

The program containing the code inserted with these commands is executed. At the first BREAK point, execution is halted and a prompt is issued to the user's terminal requesting a command to be entered. A SAVE command is entered at the terminal to save the present state of execution. A SET command is then entered to set the values of two variables, A and B, which are used to compute the values of X, Y, and Z. The EXECUTE command is entered to resume program execution.

At the end of the execution of the section of code under analysis, the pre-inserted CONTENTS command displays the computer values of X, Y, and Z. The pre-inserted BREAK command allows time for these values to be examined and gives the user the opportunity to enter new commands. At this time, a RESTORE command is entered to restore the computation state to the one previously saved by the SAVE command. The computation state returns to that which followed the first BREAK command, allowing the code under analysis to be tested with different input values. Different values for A and B are entered and the contents of X, Y, and Z are observed as before. This process is repeated several times with different, carefully selected values for A and B; the corresponding values of X, Y, and Z are closely examined each time. The results of several computations look suspect. Their input and output values are noted, and the code is more thoroughly examined. The program is finally terminated by entering the EXIT command at one of the two possible break points.

Effectiveness. To be an effective testing tool, an interactive test and debugging aid should be used with a disciplined strategy to guide the testing process. The tools can easily be misused if no testing methodology is associated with their use.

Effectiveness can depend on how easy it is to implement this technique. A desired automatable feature is the ability to trace data from a low-level program representation (i.e., object code) to a high-level program representation (i.e., source code).

Applicability. Interactive test and debugging aids can be applied to any type of source code. Most existing tools, however, are language-dependent and operating system-dependent. This technique is most applicable during unit, module, and integration test phases.

Maturity. Interactive test and debugging aids are extremely mature, and tools are available for most high-order languages.

User Training. The amount of learning required to use these tools is comparable to that required to use a text editor. To use the tool most efficiently, however, requires extensive practice.

Costs. Programs executed under an interactive test and debugging aid require more computing resources (e.g., execution time, main storage for diagnostic tables) than if executed under usual operation. The cost depends on the implementation of the tool. For example, those based on interpretive execution involve different costs from those driven by monitor calls.

RANDOM TESTING

Random testing is a black-box testing technique in which a program is tested by randomly sampling input. The sampling criteria may or may not consider the knowledge of the actual distribution of input. This technique is useful in making operational estimates of software reliability.

One striking advantage of random testing is that the test data generator used for the random test may create sequences of data and resulting events never envisaged by the designers--the kind of events that have an unpleasant way of surfacing as soon as the software project is in the operational environment.

Information Input. The input to this technique is the randomly generated input values.

Information Output. The primary output of this technique is the detection of errors that occur because certain sequences of events occur during program execution.

Outline of Method. This ad hoc testing technique involves several steps. A random input generator is necessary. The type of generator used depends on the type of input data. If the output is large, the method of checking for anomalies is usually automated. As the output from the random input is generated, it is compared with known or expected output values. All discrepancies are reported along with the input values that detect the discrepancy. If the output is small, the generated output and comparison output are compared visually. The investigator then attempts to explain the discrepancies.

Example. Random testing may be used to detect errors in the software, as described previously, or it may be used in a simulation of the actual environment to estimate actual performance. In the latter case, the random data generator would be designed to provide data according to a scenario that describes the envisioned data environment. Such a scenario would include both valid and invalid data to test the software as extensively as possible.

Random testing should never be the principal testing approach in a test plan. It can be a practical approach in specific environments, however, and provide a valuable supplement to conventional testing techniques.

Effectiveness. This technique can be used to detect a range of errors with a low level of assurance but no specific types of errors with a high level of assurance. Nevertheless, the method does not detect these errors as well as other test techniques. A randomly selected collection of test cases has little chance of being an optimal subset of all possible input.

Applicability. This technique is applicable during the unit, module, and integration testing phases; it is also applicable for a software development project that has a dedicated computer.

Maturity. This test technique is straightforward in principle and concept and is fairly mature.

User Training. No specific skills are needed to implement this technique.

Costs. Depending on the environment, the cost of this technique varies widely. Executing a program with the randomly generated input is the primary overhead.

FUNCTIONAL TESTING

Functional testing is the validation of a program's functional correctness by execution under controlled input stimuli. The two techniques described in this section are specification-based functional testing and cause-effect graphing. Cause-effect graphing is a method used to assist specification-based functional testing in the area of test case and test data development.

Specification-Based Functional Testing

Functional testing involves generating test data based on knowledge of the program's functions and input. Many test cases can be generated this way for most programs. There are no metrics to indicate the thoroughness of testing or when it can stop, but certainly every function should be tested at least once.

Information Input. Data information and function information constitute the input for this technique.

Data Information. This technique requires the availability of detailed requirements and design specifications and, in particular, detailed descriptions of input data, files, and data bases. Both the concrete and abstract properties of all data must be described. Concrete properties include type, value ranges and bounds, record structures, and bounds on file data structure and data base dimensions. Abstract properties include subclasses of data that correspond to different functional capabilities in the system and subcomponents of compound data items that correspond to separate subfunctional activities in the system.

Functional Information. The requirements and design specifications must also describe the different functions implemented in the system.

Requirements functions correspond to the overall functional capabilities of a system or to subfunctions that are visible at the requirements stage and are necessary to implement overall capabilities. Different overall functions correspond to conceptually distinct classes of operations that can be carried out through the system. Different kinds of subfunctions can also be identified. Process descriptions in structured specifications, for example, describe data transformations that are visible at requirements time and that correspond to requirements subfunctions. Requirements subfunctions also occur implicitly in data base designs. Data base functions are used to reference, update, and create data bases and files.

The system designer must create both general and detailed functional constructs to implement the functions in the requirements specifications. Structured design techniques are particularly useful for identifying and documenting design functions. Designs are represented as an abstract hierarchy of functions. The functions at the top of the hierarchy denote the overall functional capabilities of a program or system and may correspond to requirements functions. Functions at lower levels correspond to the functional capabilities required to implement the higher-level functions. General design functions often correspond to modules or parts of programs that are identified as separate functions by comments. Detailed design functions may be created during the programming stage of system development and may correspond to single lines of code.

Information Output. The output to be examined depends on the nature of the tested function. If it is a straight input/output function, output values are examined. The testing of other classes of functions may involve the examination of the state of a data base or file.

Outline of Method. Functional testing is sometimes referred to as black-box testing, because detailed information about the program's internal structure need not be used to formulate the test data. Instead, test data is chosen in the following ways:

Data is chosen to explore whether the program correctly performs the intended functions—The functions should be described in the requirements and specifications for the program.

The input to the program is examined—Based on the quantities they represent and how the program functions ought to operate on them, the possible values for each input variable can be partitioned into subdomains. Test data sets are generated by taking combinations of samples from each subdomain.

A measure of the program's output behavior is defined—Test data is sought that drives this measure toward an undesirable value. Techniques from mathematical optimization can be used to do this.

Errors are detected manually through the examination of the program's output.

Functional testing is supported by two types of automated tools—test harnesses or drivers and stress-testing tools. A test harness provides an environment for testing individual software modules or groups of modules. The tool can fill in for missing program components, including a main program. Test harnesses are most useful in an interactive environment to start, terminate, or interrupt execution at an arbitrary point in a program. Most test harness tools have some debugging capabilities.

A stress-testing tool (e.g., the adaptive tester) can automatically generate test data. The tool tries to find input data that causes undesirable behavior in the test program. To do this, the tester must devise a numerical measure of program behavior (i.e., an objective function). Various techniques can be used to maximize (or minimize) the value of the objective function; most of these assume that the objective function has certain continuity properties.

Examples. This section provides examples of two types of functional testing. Example 1 is a testing of requirements functions.

Application. A computerized dating system contains a sequential file of potential dates. Each client for the service submits a completed questionnaire that is used to find the five most compatible dates. Certain criteria must be satisfied before any potential data is selected, and it is possible that fewer than five dates—including no dates—might be found for a client.

Error. An error in the file-processing logic causes the program to select the last potential date in the sequential file whenever there is no potential date for a client.

Error Discovery. The number of dates found for each client is a dimension of the output data and has extremal values O and 5. If the "find-a-date" functional capability of the system is tested with client data for which no date should exist, the presence of the error is revealed.

Example 2 is a testing of detailed design functions.

Application. The designer of the computerized dating system in example 1 decides to process the file of potential dates for a client by reading in the records in sets of 50 records each. A simple function is designed to compute the number of record subsets.

Error. The number of subsets function returns the value 2 when there are fewer than 50 records in the file.

Error Discovery. The error is discovered when the function is tested over the extremal case for which it should generate the minimal output value 1. This error is not revealed (except by chance) when the program is tested at the requirements specification level. It also is not necessarily revealed unless the code implementing the function is tested independently and not in combination with the rest of the system.

Effectiveness. Studies have been carried out that indicate functional testing to be highly effective in detecting a range of errors in each of the major error categories.

Its use depends on specific descriptions of system input and output data and a complete list of all functional capabilities. The method is essentially manual and somewhat informal. If a formal language could be designed for describing all input and output data sets, a tool could be used to check the completeness of these descriptions. Automated generation of extremal, non-extremal, and special cases might be difficult because no rigorous procedure has been developed for this purpose.

For many errors it is necessary to consider combinations of extremal, non-extremal, and special values for functionally related input data variables. To avoid combinatorial explosions, combinations must be restricted to a small number of variables. Attempts have been made to identify important combinations, but there are no absolute rules, only suggestions and guidelines.

Because functional testing does not have a well-developed methodology or an objective measure of test thoroughness, its success depends heavily on the skill of the person conducting the tests. Functional testing often operates under a budget constraint, in which case efficiency in finding errors is of utmost importance.

Any error that prevents a program from operating correctly can be found through functional testing. Functional testing alone, however, is not useful for determining the efficiency of a program or for debugging. Functional testing cannot guarantee the absence of errors or that the code has been thoroughly tested.

Applicability. Functional testing can begin at the unit, module, or integration test phase if the units perform well-defined functions. Therefore, it can be used for both bottom-up and top-down development.

Maturity. Supporting tools (e.g., test harnesses or test drivers) are readily available or can easily be implemented.

User Training. It is necessary to develop expertise in the identification of extremal and special cases and to avoid the combinatorial explosions that may occur when combinations of extremal and special values for different data items are considered. It is also necessary to become skilled in the identification of specifications functions, although this process is simplified if a systematic approach is followed for the representation of requirements and design.

Costs. The cost of functional testing depends on the number of test runs made. This is true of both analyst and computer costs, because the setup costs are low. Test harnesses and stress-testing tools have very low overheads and can provide a saving over manual testing in computer and analysis costs.

Cause-Effect Graphing

Cause-effect graphing is a test-case design methodology that can be used with functional testing. It is used to systematically select a set of test cases that have a high probability of detecting program errors. This technique explores the input and combinations of input conditions of a program to develop test cases and does not examine the internal behavior or structure of a program. In addition, for each test case derived, the technique identifies the expected output. The input and output of the program are determined through analysis of the requirement specifications. These specifications are then translated into a Boolean logic network or graph. The network is used to derive test cases for the software under analysis.

Information Input. The information required as input to carry out this technique is a natural-language specification of the program to be tested. The specification should include all expected input and input combinations to the program, as well as expected output.

Information Output. Cause-effect graphing output consists of the following:

· An identification of incomplete or inconsistent statements in the requirement specifications.

· A set of input conditions (i.e., causes) on the software.

· A set of output conditions (i.e., effects) on the software.

· A Boolean graph that links the input conditions to the output conditions.

· A limited-entry decision table that determines which input conditions result in each identified output condition.

· A set of test cases.

· The expected program results for each derived test case.

This output represents the result of performing the various steps recommended in cause-effect graphing.

Outline of Method. A cause-effect graph is a formal language translated from a natural-language specification. The graph itself is represented as a combinatorial logic network. The following steps should be taken to create a cause-effect graph to derive test cases:

· Identify all requirements of the system and divide them into separate identifiable entities.

· Carefully analyze the requirements to identify all the causes and effects in the specification-A cause is a distinct input condition; an effect is an output condition or system transformation (an effect that an input has on the state of the program or system).

· Assign each cause and effect a unique number.

· Analyze the semantic content of the specification and transform it into a Boolean graph linking the causes and effects:

· Represent each cause and effect by a node identified by its unique number.

· List all the cause nodes vertically on the left side of a sheet of paper; list the effect nodes on the right side.

· Interconnect the cause and effect nodes by analyzing the semantic content of the specification. Each cause and effect can be in one of two states: true or false. Using Boolean logic, set the possible states of the causes and determine under what conditions each effect is present.

· Annotate the graph with constraints describing combinations of causes or effects that are impossible because of syntactic or environmental constraints.

· Through the methodical tracing of state conditions in the graph, the graph should be converted into a limited-entry decision table as follows. For each effect, trace back through the graph to find all combinations of causes that make the effect true. Each such combination is represented as a column in the decision table. The state of all other effects should also be determined for each such combination. Each column in the table represents a test case.

· Convert the columns in the decision table into test cases.

Although this technique to create test cases has not yet been totally automated, conversion of the graph to the decision table, the most difficult aspect of the technique, is an algorithmic process that could be automated by a program.

Causes and Effects of Error

Causes

1. Character in column 1 is D.

2. Character in column 1 is L.

3. Character in column 2 is a digit.

Effects

50. Index section is displayed.

51. Error message A is displayed.

52. Error message B is displayed.

Table 8

Example. A data base management system requires that each file in the data base have its name listed in a master index identifying the location of each file. The index is divided into 10 sections. A small system is being developed that allows the user to interactively enter a command to display any section of the index at the terminal. Cause-effect graphing is used to develop a set of test cases for the system. The specification for this system is explained in the following paragraphs.

To display one of the 10 possible index sections, a command must be entered consisting of a letter and a digit. The first character entered must be a D (for display) or an L (for list), and it must be in column 1. The second character entered must be a digit (O through 9) in column 2. If this command occurs, the index section identified by the digit is displayed on the terminal. If the first character is incorrect, error message A is printed. If the second character is incorrect, error message B is printed. The error messages are:

· A: INVALID COMMAND.

· B: INVALID INDEX NUMBER.

The causes and effects have been identified in Table 8. Each has been assigned a unique number.

A Boolean graph (see Figure 16) is constructed through analysis of the semantic content of the specification. Node 20 is an intermediate node representing the Boolean state of node 1 or node 2. The state of node 50 is true if the states of nodes 20 and 3 are both true. The state of node 20 is true if the state of node 1 or node 2 is true. The state of node 51 is true if the state of node 20 is not true. The state of node 52 is true if the state of node 3 is not true.

Figure 16. Boolean Graph

Nodes 1 and 2 are also annotated with a constraint that states that causes 1 and 2 cannot be true simultaneously. Table 9 shows the graph converted into a decision table.

Decision Table

	Test Cases
Causes	1	2	3	4
1	1	0	0
2	0	1	0
3	1	1		0
Effects
50	1	1	0	0
51	0	0	1	0
52	0	0	0	1

Table 9

For each test case, the bottom of the table indicates which effect is present (indicated by a 1). For each effect, all combinations of causes that result in the effect are represented by the entries in the columns of the table. Blanks in the table mean that the state of the cause is irrelevant.

Each column in the decision table is converted into test cases in Table 10.

Effectiveness. Cause-effect graphing can produce a useful set of test cases and can point out incompleteness and ambiguities in the requirement specification.

Applicability. Cause-effect graphing can be applied to generate test cases in any type of computing application when the specification is clearly stated and combinations of input conditions can be identified. Although manual application of this technique is tedious, long, and moderately complex, it could be applied to selected modules where complex conditional logic must be tested. This technique is applicable during algorithm confirmation through unit test phases.

Maturity. Cause-effect graphing, although mature, is not widely used, mainly because it has not been totally automated and manual application is time-consuming.

User Training. Cause-effect graphing is a mathematically based technique that requires some knowledge of Boolean logic. The requirement specification of the system must also be clearly understood to successfully carry out the process.

Costs. Manual application of this technique is extremely labor-intensive.

MUTATION TESTING

Mutation testing involves modifying actual statements of the program. Mutation analysis and error seeding, two examples of this technique, are described in the following sections.

Test Cases

Test Case No. Input Expected Results

1 D5 Index section 5 is displayed

2 L4 Index section 4 is displayed

3 B2 INVALID COMMAND

4 DA INVALID INDEX NUMBER

Table 10

Mutation Analysis

Mutation analysis is a technique for detecting errors in a program and for determining the thoroughness with which the program has been tested. It measures test data adequacy or the ability of the data to ensure that certain errors are not present in the program under testing. The method entails studying the behavior of a large collection of programs that have been systematically devised from the original program.

Information Input. The basic input required by mutation analysis is the original source program and a collection of test data sets on which the program operates correctly, and which the user considers to adequately and thoroughly test the program.

Information Output. The ultimate output of mutation analysis is a collection of test data sets and an assurance that the collection thoroughly tests the program. The mutation analysis process may arrive at this final state only after having exposed program errors and inadequacies in the original test data set collection. Therefore, errors detected, new program understanding, and additional test data sets can also be considered information output of the mutation analysis process.

Outline of Method. The essential approach taken in the mutation analysis of a program is to produce many versions, each devised by a trivial transformation of the original, and to subject each version to testing by the given collection of test data sets. Because of the nature of the transformations, it is expected that the devised versions are essentially different programs from the original, and the testing regimen should demonstrate that this is so. Failure of execution to produce different results indicates that the collection of test data sets is inadequate. This usually leads to greater understanding of the program and either the detection of errors or an improved collection of test data sets, or both.

A central feature of mutation analysis is the mechanism for creating the program mutations-the derived versions of the original program. The set of mutations that is generated and tested is the set of all programs differing from the original only in a small number (generally one or two) of textual details (e.g., a change in an operator, variable, or constant). Research indicates that larger numbers of changes contribute little or no additional diagnostic power.

The basis for this procedure is the assumption that program errors are not random phenomena but a result of lapses of human memory or concentration. Therefore, an erroneous program is expected to differ from the correct one only in a small number of details; if the original program is incorrect, the set of all programs created by making a small number of textual changes should include the correct program. A thorough collection of test data sets would reveal behavioral differences between the original incorrect program and the derived correct one.

Mutation analysis entails determining whether each mutant program behaves differently from the original program. If so, the mutant is considered incorrect. If not, the mutant must be studied carefully. It is entirely possible that the mutant is functionally equivalent to the original program. If so, its identical behavior is clearly benign. If not, the mutant is highly significant, as it certainly indicates an inadequacy in the collection of test data sets. It may furthermore indicate an error in the original program that went undetected because of the inadequate testing. Mutation analysis facilitates the detection of such errors by automatically raising the probability of error and then demanding justification for concluding that each error has not been committed. Most mutations quickly manifest different behavior under exposure to any reasonable test data set collection, thereby demonstrating the absence of the error for which they were created in the original program. This forces detailed attention on those mutants that behave identically to the original and therefore on any actual errors.

If all mutations of the original program reveal different execution behavior, the program is considered to be adequately tested and correct within the limits of the assumption.

Example. The following FORTRAN program counts the number of negative and nonnegative numbers in array A:

SUBROUTINE COUNT (A, NEG, NONNEG)

DIMENSION A(5)

NEG = O

NONNEG = O

DO 101 = 1,5

IF (A(I).GT.O) NONNEG = NONNEG + 1

IF (A(I).LT.O) NEG = NEG + 1

10 CONTINUE

RETURN

END

and the collection of test data sets produced by initializing A in turn to:

I II III

1 1 -1

-2 2 -2

3 3 -3

-4 4 -4

5 5 -5

Mutants might be produced based on the following alterations:

· Changing an occurrence of any variable to any other variable, for example:

A to I

NONNEG to NEG

I to NEG

· Changing an occurrence of a constant to another constant that is close in value, for example:

1 to 0

0 to 1

0 to -1

l to 2

· Changing an occurrence of an operator to another operator, for example:

NEG + 1 to NEG * 1

NEG + 1 to NEG - 1

A(1).GT.0 to A(I).GE.0

A(1).LT.O to A(I).NE.0

Therefore, the set of all single-alteration mutants consists of all programs containing exactly one of the previous changes. The set of all double-alteration mutants consists of all programs containing a pair of the previous changes.

Clearly, many such mutants are radically different and would exhibit obviously different behavior. For example, changing variable I to A (or vice versa) renders the program uncompilable by most compilers. Similarly, changing NEG = 0 to NEG = 1 causes a different outcome for test case 1.

Changing A(I).GT.0 to A(I).GE.0 or A(I).LT.0 to A(I).LE.0 causes no changes in run-time behavior on any of the three test data sets. This rivets attention on these mutants and subsequently on the issue of how to count zero entries. It is apparent that collection of test data sets was inadequate because it did not include any zero input values. Had it included one, it would have indicated that IF (A(I).GT.O) NONNEG = NONNEG + 1 should have been IF (A(I).GE.O) NONNEG = NONNEG + 1.

Therefore, mutation analysis has pointed out both this error and this weakness in the collection of test data sets. After the program and collection of test data sets are changed, all mutants behave differently, which raises confidence in the correctness of the program.

Effectiveness. Mutation analysis is an effective technique for detecting errors, but it requires combining an insightful analyst with automated tools. It is a reliable technique for demonstrating the absence of all possible mutation errors (e.g., those involving alteration, interchanging, or omission of operators or variables).

Automated tools are necessary because any program has an enormous number of mutations, each of which must be generated, exercised by the test data sets, and evaluated. This entails thousands of edit runs, compilations and executions. Tools are available, however, that operate on a special internal representation of the original program. This representation is readily and efficiently transformed into the various mutations and serves as the basis for rapid simulation of the mutants' executions, thereby avoiding the need for compilation and loading of each mutant.

Human intervention, however, is still necessary. Mutants that behave identically to the original program must be scrutinized to determine whether the mutant is equivalent or whether the collection of test data sets is inadequate.

At the end of a successful mutation analysis, many errors may be uncovered and the collection of test data sets may be very thorough. Whether the absence of errors is established, however, must still be determined. Clearly all errors of mutation are detectable by mutation analysis; therefore, the absence of diagnostic messages or findings indicates the absence of these errors. Mutation analysis cannot, however, ensure the absence of errors that cannot be modeled as mutations.

Applicability. Mutation analysis is applicable to any algorithmic solutions specification. This technique is applicable during unit and module test phases. As previously indicated, it can only be considered effective when supported by a body of sophisticated tools. Tools enabling analysis of FORTRAN and COBOL source text exist, and there is no reason why tools for other coding languages, as well as algorithmic design languages, could not be built.

Maturity. Mutation analysis is a fairly mature testing technique. The considerable amount of analyst time necessary to check output of mutant programs is its major drawback. Research and development of the technique is continuing, especially in the newer higher-order languages.

User Training. The technique requires becoming familiar with the philosophy and goals of this novel approach. The more familiar analysts are with the subject algorithmic solution specifications, the more effective they can be, because they may need to analyze both a collection of test data sets to determine how to augment it and two programs to determine whether they are equivalent.

Costs. Significant amounts of human analyst time are likely to be necessary to do mutation analysis. The computer time required is not likely to be excessive if the sophisticated tools described earlier are available.

Error Seeding

Error seeding evaluates the thoroughness with which a computer program is tested by purposely inserting errors into a supposedly correct program.

Information Input. The input required by this technique is the original source program and a collection of test data sets on which the program operates correctly and which the tester considers to test the program adequately and thoroughly. A desirable second input is information on the relative distribution of different types of errors usually occurring in this type of software.

Information Output. The output of this technique is an indication of the level of undetected errors existing in the program. That is, if many seeded errors are found, it is reasonable to assume that a correspondingly large number of the original, true errors have also been found.

Outline of Method. This technique is similar to mutation analysis in that the basis for the procedure is the assumption that program errors are not a random phenomenon but the result of lapses of human memory or concentration.

Seeding a program with errors consists of changing statements in the program source code; for example, changing an arithmetic operator, changing a relational operator in a condition statement, or substituting the name of one variable for another. It is expected that the introduction of these changes will cause the revised program to generate test results different from those of the original program. Failure to do so invites suspicion that the test data sets are insensitive. The seeded program and its output are usually debugged by personnel other than those who seeded the program with errors.

The proportion of seeded errors that remain undetected gives an indication of the corresponding proportion of true, undetected errors in the original program. That is, if 20% of the seeded errors were not detected by the test data sets, it suggests that 20% of the actual errors have not been detected by this set of tests.

To give the error-seeding results greater validity, it is desirable to select the seeded errors in a manner consistent with the types of errors expected in the tested computer program.

The following issues should be considered:

· To be realistic, the errors should be representative of those found in large programs in both type and frequency of occurrence.

· The error types must be applicable to the software being tested and that test environment.

· To evaluate test tools that use program execution, one or more errors should lead to abnormal program behavior for at least some test data.

A static analyzer might be used to classify the source statements in the program to be tested. The random generation of errors could then be weighted to reflect the corresponding proportion of statement types in the program.

Example. The following describes a typical error seeding in the integration test phase. A launch sequence control program has completed its integration test successfully. Fifty errors were found and corrected. Historically, such programs have had many residual logical sequencing errors in this group, so it was decided to assess the adequacy of testing for such errors through the use of the error-seeding technique.

An assortment of errors similar to those made in the past were randomly introduced into the launch sequence control. The integration tests were then run against the seeded program. Fourteen of the 20 seeded errors were detected; that is, 70% of the seeded errors were found. This indicated that the 50 errors found during integration testing were probably only 70% of the total number of original errors. Therefore, an additional 21 or 22 errors in the code remained.

The characteristics of the seeded errors that escaped detection were then analyzed and new integration tests were devised to detect those types of errors.

Effectiveness. Error seeding can be a reasonably effective technique to assess the adequacy of a set of software tests. The accuracy of the prediction of remaining errors is closely related to the realistic distribution of seeded error types and locations. The error-seeding process should be considered only a rough indicator of remaining errors.

Applicability. Error seeding is applicable to any type of software in which the presence of residual errors after testing would be intolerable. The more critical the application of the software, the more extensive the application of error seeding. This technique is applicable at the end of the unit, module, integration, and, less likely, system test phases. Because error seeding can provide test coverage assessment, it can be applied throughout the testing phases.

Maturity. Error seeding is a fairly mature theory, although it has not been applied widely as a standard procedure throughout the software development community.

User Training. No special knowledge is necessary to implement this technique.

Costs. Because the technique requires both test personnel and computer resources to test for the seeded errors after the basic testing is complete, along with follow-up analysis of the test results by the personnel who did the seeding, the cost of the technique is significant.

Currently, programs that automate a significant portion of the error-seeding technique seem unlikely, because human judgment is required in most steps of the technique. When expert systems become more cost-effective, the error-seeding approach may be a candidate for automation.

REAL-TIME TESTING

Real-time testing may involve testing on host computers with environment simulators or testing the software on the target computer in the actual hardware and software system or a simulation of the actual system.

Real-time testing on a host computer may be augmented by the use of a test bed. A test bed is a computer-based test environment used to test a software component. The test environment simulates the environment in which the software usually operates. A test bed allows full control of input and computer characteristics, processing of intermediate output without destroying simulated execution time, and full test repeatability and diagnostics. To be effective, the controlled circumstances of the test bed must truly represent the behavior of the system of which the software is a part.

In a similar way, real-time testing can be implemented with the target computer in a real-world simulation. This real world is constructed by installing the target computer in a configuration that includes as much of the actual interfacing subsystem hardware as possible. One or more computers simulate the remainder of the real environment. This entire configuration is called an environment simulator, and it has the same goal as a test bed: to provide the most realistic environment in which to test the real-time performance of the software on the embedded computer.

Information Input. Information input consists of the input signals characteristic of the real environment presented in a realistic time sequence. These signals may be generated by the actual interfacing equipment or by external computers simulating the action of that equipment.

Information Output. Information output consists of the results observed through execution of the software. This information is used as a preliminary means of determining whether the software will operate as intended in its real environment. Timing and logic problems may be exposed.

Outline of Method. To be of value, this environment must realistically reflect those properties of the system that affect or are affected by the operation of the software. The environment should simulate only those components in the system that the software requires as a minimum interface with the system. This permits testing to focus only on the software component for which the test bed is built.

Test beds are built through the consideration of and proper balance among the following factors:

· The amount of realism required by the test bed to properly reflect the operation of system properties.

· Resources available to build the test bed.

· The ability of the test bed to focus only on the software being tested.

Test beds come in many forms, depending on the level of testing desired. For single module testing, a test bed may consist merely of test data and a test driver or harness, which feeds input data to the program module being tested, causes the module to be executed, and collects the output generated during the program execution. If a completed but non-final version of software is to be tested, the test bed may also include stubs, dummy routines that simulate the operation of a module that is invoked within a test. Stubs can be as simple as routines that automatically return on a call, or they can be more complicated and return simulated results. The final version of the software may be linked with other software subsystems in a larger total system. The test bed for one component in the system may consist of those system components that directly interface with the component being tested.

As illustrated in the previous examples, test beds permit the testing of a component of a system without requiring the availability of the full, complete system. They merely supply the input required by the software component to be executed and provide a repository for output to be placed for analysis. In addition, test beds may contain monitoring devices that collect and display intermediate output during program execution. In this way, test beds provide the means of observing the operation of software as a component of a system without requiring the availability of other system components, which may be unreliable.

Example. An airborne tracking system must be able to process a given rate of radar return messages, apply the required tracking algorithms, and control the displays to radar operators on board the surveillance aircraft. Because actual flight time is prohibitively expensive for all testing, a real-time simulator test bed is constructed to simulate the radar and other subsystems with which the airborne operational computer program must perform.

To simulate the environment, a second computer is programmed to provide the radar return messages in real time and in general to simulate the actual environment in which the on-board computer would operate in the air during a mission. This program in the second computer is the test driver software, and the second computer is called the environment simulator. The second computer also records the output of the operational computer program for later analysis. To simulate the environment completely, the environment simulator may also consist of other airborne hardware programmed to present a realistic interface to the operational software being tested.

The test driver then dumps the recorded information from the output file to a printer so the output can be analyzed and verified for correctness. When the volume of output data is large, it is also possible that post-processing software could be written to analyze

Effectiveness. The use of test beds has proven to be a highly effective and widely used technique to test the operation of software. The use of test drivers or test harnesses, in particular, is one of the most widely used testing techniques. This technique is able to detect a wide range of errors.

Applicability. This method is applicable from program through system test and for all types of computing applications.

Maturity. This mature technique is widely used.

User Training. To build an effective test bed, it is necessary to develop a solid understanding of the software and its dynamic operation in a system. This understanding should help determine which parts of the test bed require the most attention during its construction. In addition, knowledge of the dynamic nature of a program in a system is required in gathering useful intermediate output during program execution and in properly examining these results.

Costs. The amount of realism desired in a test bed is the largest factor affecting cost. Building a realistic test bed may require the purchasing of new hardware and the development of additional software to properly simulate an entire system. In addition, these added resources may be so specialized that they may seldom, if ever, be used again in other applications. Therefore, very sophisticated test beds may not prove to be highly cost-effective.

SYMBOLIC TESTING

Symbolic testing involves the execution of a program from a symbolic point of view. Symbolic testing is applied to program paths and can be used to generate expressions that describe the cumulative effect of the computations occurring in a program path. It can also be used to generate a system of predicates that describes the subset of the input domain that causes a specified path to be traversed. The programmer is expected to verify the correctness of the output generated by symbolic execution in the same way that output is verified that has been generated by executing a program over actual values.

Information Input. The input to this technique consists of:

· Source code – This method requires the availability of the program source code.

· Program paths – The path or paths through the program that are to be symbolically evaluated must be specified. The paths may be specified directly by the tester or, in some symbolic evaluation systems, selected automatically.

· Input values – Symbolic values must be assigned to each of the input variables for the path or paths that are to be symbolically evaluated. The tester may be responsible for selecting these values, or the symbolic evaluation system may select them automatically.

Information Output. The output of this technique consists of:

Values of variables – The variables whose final symbolic values are of interest must be specified. Symbolic execution results in the generation of expressions that describe the values of these variables in terms of the dummy symbolic values assigned to input variables.

System of predicates – Each branch predicate that occurs along a program path constrains the input that causes that path to be followed. The symbolically evaluated system of predicates for a path describes the subset of the input domain that causes that path to be followed.

Outline of Method. The symbolic execution of a path is carried out by symbolically executing the sequence of assignment statements occurring in the path. Assignment statements are thus executed by symbolically evaluating the expressions on the right hand side of the assignment. The resulting symbolic value becomes the new symbolic value of the variable on the left-hand side. An arithmetic or logical expression is symbolically executed by substituting the symbolic values of the variables in the expression for the variables.

The branch conditions or predicates that occur in conditional branching statements can be symbolically executed to form symbolic predicates. The symbolic system of predicates for a path can be constructed by symbolically executing both assignment statements and branch predicates during symbolic path execution. The symbolic system of predicates consists of the sequences of symbolic predicates that are generated by the execution of the branch predicates.

Symbolic execution systems are used to facilitate this technique. All symbolic execution systems must contain facilities for selecting program paths to be symbolically executed, symbolically executing paths, and generating the required symbolic output.

Three types of path selection techniques have been used: interactive, static, and automatic. In the interactive approach, the user specifies the paths to be executed in advance. In the automatic approach, the symbolic execution system is constructed so that control returns to the user each time it is necessary to make a decision as to which branch to take during the symbolic execution of a program. In the static approach, the user specifies the paths to be executed in advance. In the automatic approach, the symbolic execution system attempts to execute all those program paths with a consistent symbolic system of predicates. A system of predicates is consistent if it has a solution.

The details of symbolic execution algorithms in different systems are largely technical. Symbolic execution systems may differ in other than technical details in the types of symbolic output they generate. For example, some systems contain facilities for solving systems of branch predicates. Such systems are capable of automatically generating test data for selected program paths (i.e., program input data that cause the path to be followed when the program is executed over that data).

Example. An example of symbolic execution follows.

Application. A FORTRAN program called SIN was written to compute the sine function using the McLaurin series.

Errors. The program contained three errors: an initialized variable, the use of the expression -1**(1/2) instead oft-1)**(1/2), and the failure to add the last term computed in the series on the final computed sum.

Different paths through SIN correspond to different numbers of iterations of the loop in the program that is used to compute terms in the series. The symbolic output in the following figure was generated by symbolically evaluating the path that involves exactly three interactions of the loop:

PREDICATES:

(X**3/6).GE.E

(X**5/120).GE.E

(X"7/5040).LT.E

OUTPUT:

SIN = ?SUM = (X**3/6)- (X**5/120)

Symbolic output for SIN

Error Discovery. The errors in the program are discovered by comparing the symbolic output with the standard formula for the McLaurin series. The symbolic evaluator that was used to generate the output represents the values of variables that have been uninitialized with a question mark and the name of the variable. The error involving the expression (-1)""(1/2) results in the generation of the same rather than alternating signs in the series sum. The failure to use the last computed term can be detected by comparing the predicates for the symbolically evaluated path with the symbolic output value for SIN.

Effectiveness. Symbolic evaluation is most effective in detecting computation errors but also detects logic errors and data-handling errors. This technique is also effective in assisting the development of test data.

One of the primary uses of symbolic evaluation is in raising the confidence level in a program. Correct symbolic output expressions confirm to the tester that the code carries out the desired computations.

Applicability. This method is useful primarily for programs written in languages involving operations that can be represented in a concise, formal way. Most symbolic evaluation systems that have been built are for use with such algebraic programming languages as FORTRAN and PWI. Algebraic programs involve computations that can be easily represented with arithmetic expressions. It is difficult to generate symbolic output from programs that involve complex operations with such wordy representations as the REPLACE and MOVE CORRESPONDING operations in COBOL.

Symbolic execution is most feasible when used with small segments of code. The degree of detail required in its application limits the effectiveness of the technique in testing large-scale programs. This technique is applicable during algorithm confirmation, design verification, unit, and module test phases.

Maturity. Symbolic evaluation and associated symbolic evaluation systems are in the developmental and experimental phases. Their use requires specialized skills. Current tools do not remove enough of the mechanical burden from the tester to allow widespread use of the technique.

User Training. It takes a certain amount of practice to choose paths and parts of paths for symbolic evaluation. The tester must avoid the selection of long paths or parts of paths that result in the generation of expressions that are so large that they are unreadable. If the symbolic evaluation system being used gives the tester control over the types of expression simplification that are carried out, the tester must learn to use this in a way that results in the generation of the most revealing expressions.

Costs. Storage and execution time costs for symbolic evaluation have been calculated in terms of program size, path length, number of program variables, and cost of interpreting (rather than compiling and executing) a program path. The storage required for symbolically evaluating a path of length P in a program with S statements containing N variables is estimated to be on the order of 10(P + S + N). C1 is the length of a program path, C2 is the average cost of interpreting a statement in a program path, Exp is the cost of symbolically evaluating a function, Cons is the cost of checking the consistency (i.e., solvability) of a system of symbolic predicates, and Cond is the cost of evaluating a condition in a conditional statement. Cons and Cond are expressed in units of the cost of interpreting a statement in a program. The cost (in execution time) of symbolically executing a program path is estimated to be on the order of C1 * C2(2 + Exp + Cons/l0 + Cond/100).

FORMAL ANALYSIS

The purpose of formal analysis is to apply the formality and rigor of mathematics to the task of proving the consistency between an algorithmic solution and a rigorous, complete specification of the intent of the solution.

Information Input. The input required consists of the solution specification and the intent specification. The solution specification is algorithmic in form and is often executable code. The intent specification is descriptive in form, invariably consisting of assertions, usually expressed in predicate calculus.

Additional input may be required depending on the rigor and specific mechanisms to be used in the consistency proof. For example, the language semantics used to express the solution specification are required and must be supplied to a degree consistent with the rigor of the proof being attempted. Similarly, simplification rules and rules of inference may be required as input if the proof process is to be completely rigorous.

Information Output. The proof process may terminate with a successfully completed proof of consistency, a demonstration of inconsistency, or an inconclusive finding. In the first two cases, the proofs themselves and the proven conclusion are the output. In the third case, any fragmentary chains of successfully proven reasoning are the only meaningful output. As expected, their significance is highly variable.

Outline of Method. The usual method used in carrying out formal verification is Floyd's method of inductive assertions or a variant thereof. This method entails the partitioning of the solution specification into algorithmically straight-line fragments by means of strategically placed assertions. This partitioning reduces the proof of consistency to the proof of a set of smaller, generally much more manageable lemmas.

Floyd's method dictates that the intent of the solution specification be captured by two assertions--the input assertion, describing the assumptions about the input, and the output assertion, describing the transformation of the input intended to be the result of the execution of the specified solution. In addition, intermediate assertions must be fashioned and placed within the body of the solution in such a way that every loop in the solution specification contains at least one intermediate assertion. Each such intermediate assertion must completely express the transformations that should have occurred or should be occurring at the point of placement of the assertion. The purpose of so placing the assertions is to ensure that every possible program execution is decomposable into a sequence of straight-line algorithmic specifications, each of which is bounded on either end by an assertion. If it is known that each terminating assertion is necessarily implied by executing the specified algorithm under the conditions of the initial assertion, it can be shown that the entire execution behaves as specified by the input/output assertions and therefore as intended. Floyd's method directs that a set of lemmas be proved. This set consists of one lemma for each pair of assertions that is separated by a straight-line algorithmic specification and no intervening assertion. For such an assertion pair, the lemma states that, under the assumed conditions of the initial assertion, execution of the algorithm specified by the intervening code necessarily implied the conditions of the terminating assertion. Proving all such lemmas establishes what is known as partial correctness. Partial correctness establishes that whenever the specified solution process terminates, it has behaved as intended. Total correctness is established by proving also that the specified solution process must always terminate. This is clearly an undecidable question and therefore its resolution is invariably approached through heuristics.

In the previous procedure, the ability to prove the various specified lemmas is clearly pivotal. This can be done to varying degrees of rigor, resulting in proofs of varied degrees of reliability and trustworthiness. For the greatest degree of trustworthiness, the solution specification, intent specification, and rules of reasoning must all be specified with complete rigor and precision? The principal difficulty here lies in specifying the solution with complete rigor and precision. This entails specifying the semantics of a specification language and the functioning of any actual execution environment with complete rigor and precision. Such complete details are often difficult or impossible to adduce. Moreover, they are, when available, generally voluminous, resulting in long and intricate lemmas.

Example. As an example of what is entailed in a rigorous formal verification activity, the specification of a bubble sort procedure is discussed. The intent of the bubble sort must first be captured by an input/output assertion pair. Next, the bubble solution algorithm contains two nested loops, which means that two additional intermediate assertions might be fashioned, or perhaps one particularly well-placed assertion might suffice. In the former case, eight lemmas would then need to be established; one corresponding to each of the two possible paths from the initial assertion to each intermediate assertion, one corresponding to each of the two paths from an intermediate assertion back to itself, one for each of the possible two paths from one intermediate assertion to the other, and finally one for each of the possible two paths from intermediate to terminating assertion. Each lemma would need to be established through rigorous mathematical logic. Finally, a proof of necessary termination would need to be fashioned.

Effectiveness. The effectiveness of formal verification has been attacked on several grounds. Fundamentally, formal verification can establish consistency only between intent and solution specification; an inconsistency can indicate error in either or both. And failing to achieve total rigor and detail increases the possibility of error.

The amount of detail also necessitates large, complex lemmas. These, especially when proved through complex, detailed rules of inference, produce very large, intricate proofs that are highly prone to error. Complicating this further is that these proofs are generally attempted through the use of first-order predicate calculus. The incompleteness of this mathematical system implies that incorrect theorems cannot be expected ever to be reduced to obvious absurdities. Therefore, after a long and unsuccessful effort, the prover may still not know whether the lemma in question is correct.

Finally, formal verification of actual programs is further complicated by the need to rigorously express the execution behavior of the actual computing environment that executes the program. This, however, is extremely difficult to do. Consequently, the execution environment is generally modeled incompletely and imperfectly, thereby restricting the validity of the proofs in ways that are difficult to determine.

Nevertheless, a correctly proven set of lemmas establishing consistency between a complete specification and a solution specification whose semantics are accurately known and expressed conveys the greatest assurances of correctness obtainable. This ideal of assurance seems best attainable by applying automated theorem provers to design specifications, rather than to code.

This technique detects computation and logic errors.

Applicability. Formal verification can be applied to determine the consistency between any algorithmic solution specification and any intent specification. The trustworthiness of the results is highly variable, depending primarily on the rigor with which the specifications are expressed and the proofs carried out. This technique is applicable during the algorithm confirmation or design verification test phases.

Maturity. Formal analysis is still in the research and development phases.

User Training. As noted, the essence of this technique is mathematical. A considerable amount of mathematical training and expertise are necessary to produce reliable or trustworthy results with this technique.

Costs. When seriously applied, this technique must be expected to consume significant amounts of the time and effort of highly trained, mathematically proficient personnel. Considerable staff time expense must be expected.

As noted earlier, human effectiveness can be improved considerably through the use of such automated tools as theorem provers. Such tools, however, can consume considerable computer resources, resulting in large operational costs.

SUPPORT TECHNIQUES

Support techniques facilitate the software testing process but cannot detect specific types of errors when used alone.

TEST DATA GENERATORS

Test data generators are tools that generate test data to exercise a target program. These tools may generate data through analysis of the program to be tested or through analysis of the expected input in its usual operating environment. Test data generators may use numerical integrators and random number generators to create the data.

TEST RESULT ANALYZERS

Test result analyzers are used in situations in which the software cannot be analyzed completely as it is being tested. Typically, output data resulting from execution is written on tape or disk and later analyzed by the software test result analyzer. An example of the need for such post processing is the testing of a real-time computer program that must write a history tape of the system. When the program is run in a simulator, it may be impossible to check the history tape while the simulation test is being run. After the test, a test result analyzer program reads and analyzes the history tape to ensure that it was written in the required format and that it correctly recorded the history of the simulated system. Another example of a test result analyzer is the comparison of the actual results of execution with the expected results.

TEST MANAGEMENT SOFTWARE

This support software performs a configuration management function by creating a program test library containing the records of all tests, including the test software, test data, software version numbers, and corresponding test results. Such software permits more precise control and recording of the tests and may be appropriate for testing large computer programs.

TEST COMPLETION CRITERIA SOFTWARE

This support software uses statistical or reliability prediction techniques in conjunction with user-specified criteria on such factors as cost, desired software product reliability, and schedule, to help determine reasonable test completion criteria. Such software usually incorporates some of the test completion criteria discussed in Section 4, Part 1, Step 3, Task 1, The Task Process, subtask 7. Most of this software should be used in an advisory capacity, because the algorithms used are not rigorous.

TEST DRIVERS AND TEST HARNESSES

Test drivers provide the facilities needed to execute a program (e.g., input or files, commands). The input data files for the system must be loaded with data values representing the test situation or events to yield recorded data to evaluate against expected results. Test drivers permit generation of data in external form to be entered automatically into the system.

Test harnesses are more generalized test drivers. Test harnesses install the software to be tested in a test environment, insert stubs to simulate the behavior of missing modules, and provide input data. Some test harnesses are programmable so that they can be tailored for many kinds of software. Test harnesses usually provide a more complete environment than a test driver, although the distinction between the two is sometimes blurred.

COMPARATORS

A comparator is a program used to compare two versions of source data to determine whether the two versions are identical or to specifically identify where any differences in the versions occur. Comparators are most effective during software testing and maintenance when periodic modifications to the software are anticipated.

MISCELLANEOUS TESTING METHODS

The miscellaneous testing methods described in this section have not been included in the test technique category. Requirements tracing and requirements analysis facilitate the front end of the software life cycle. Regression testing uses test techniques described in this appendix to detect spurious errors resulting from software modifications.

REQUIREMENTS TRACING

Requirements tracing provides a means to verify that the software of a system addresses each requirement of that system and that the testing of the software produces adequate and appropriate responses to those requirements.

Information Input. The information needed to perform requirements tracing consists of a set of system requirements and the software that can satisfy the requirements.

Information Output. The information output by requirements tracers consists of the correspondence found between the system requirements and the software that is intended to realize these requirements.

Outline of Method. Requirements tracing generally serves two major purposes. The first is to ensure that each specified requirement of a system is addressed by an identifiable element of the system software. The second is to ensure that the testing of that software produces results that are adequate responses in satisfying each of these requirements.

A common technique for assisting in making these assurances is the use of test evaluation matrices. These matrices represent visual schemes of identifying, which requirements of a system have been adequately and appropriately addressed and which have not. There are two basic forms of test evaluation matrices. The first form identifies the mapping between the requirement specifications of a system and the modules of that system. This matrix determines whether each requirement is realized by a module in the system and, conversely, whether each module is directly associated with a specific system requirement. If the matrix reveals that any module does not address a requirement, that requirement has probably been overlooked in the software design. If a module does not correspond to any requirement of the system, that module is superfluous. In either case, the design of the software must be further scrutinized, and the system must be modified accordingly to effect an acceptable requirements-design mapping.

The second form of a test evaluation matrix provides a similar mapping, except the mapping exists between the modules of a system and the set of test cases performed on the system. This matrix determines which modules are invoked by each test case. Used with the previous matrix, it also determines which requirements are demonstrated to be satisfied by the execution of a particular test case in the test plan. During actual code development, it can be used to determine which requirement specifications relate to a particular module. In this way, it is possible to have each module print out a message during execution of a test, indicating which requirement is referenced by the execution of this module. The code module itself may also contain comments about the applicable requirements.

The two matrices should be used together to be most effective. The second matrix is built before software development. After the software has been developed and the test cases have been designed (based on this matrix), it is necessary to determine whether the execution of the test plan actually demonstrates satisfaction of the requirements of the software system. Through an analysis of the results of each test case, the first matrix can be constructed to determine the relationship that exists between the requirements and software reality.

The first matrix is useful mainly for analyzing the functional requirements of a system. The second matrix is useful in analyzing the performance, interface, and design requirements of the system, in addition to the functional requirements. Both are often used in support of a more general requirements tracing activity (i.e., preliminary and critical design reviews). This procedure is used to ensure verification of the traceability of all the previously mentioned requirements to the system design. In addition to the use of test evaluation matrices, these design reviews may include the tracing of individual subdivisions in the software design document back to applicable specifications in the requirements document. This is a constructive technique used to ensure verification of requirements traceability.

Example. An example of requirements tracing follows.

Application. A new payroll system is to be tested. Among the requirements of this system is the specification that all employees age 65 or older receive semi-retirement benefits and have their Social Security tax rate readjusted. To ensure that these particular requirements are appropriately addressed in the system software, test evaluation matrices have been constructed and filled out for the system.

Error. An omission in the software causes the Social Security tax rate of individuals age 65 or older to remain unchanged.

Error Discovery. The test evaluation matrices reveal that the requirement that employees age 65 or older have their Social Security tax rate readjusted has not been addressed by the payroll program. No module in the system had been designed to respond to this specification. The software is revised accordingly to accommodate this requirement, and a test evaluation matrix is used to ensure that the added module is tested in the set of test cases for the system.

Effectiveness. Requirements tracing is a highly-effective technique in discovering errors during the design and coding phases of software development. This technique has proved to be a valuable aid in verifying the completeness, consistency, and testability of software. If a system requirement is modified, it also helps greatly in retesting software by clearly indicating which modules must be rewritten and retested. Requirements tracing is effective in detecting errors early in the software development cycle that could be extremely expensive if not discovered until later.

Applicability. This technique is generally applicable in large- or small-system testing and for all types of computing applications during the design and code phases. If the system requirements themselves are not clearly specified and documented, however, proper requirements tracing can be difficult.

Maturity. Requirements tracing is extremely mature and widely used and has proved to be effective.

User Training. A clear understanding of the requirements of the system is essential. More complex systems result in a corresponding increase in required learning.

Costs. No special equipment or tools are needed to carry out this technique. The major cost in requirements tracing is staff time.

REQUIREMENTS ANALYSIS

The requirements for a system are usually specified with a formal language that may be graphic or textual. Requirements analysis checks for syntactical errors in the requirements specifications and produces a useful analysis of the relationships among system input, output, processes, and data. Logical inconsistencies or ambiguities in the specifications can also be identified by requirements analysis.

Information Input. The form and content of the input varies greatly for different requirements languages. Generally, there are requirements regarding what the system must produce (i.e., output) and what types of input it must accept. There are usually specifications describing the types of processes or functions that the system must apply to the input to produce the output. Additional requirements may concern timing and volume of input, output, and processes as well as performance measures regarding response time and operational reliability. In some cases, all input is textual, whereas some languages use only graphic input from a display terminal (e.g., boxes represent processes and arrows between boxes represent information flow).

Information Output. Nearly all analysis produces error reports showing syntactical errors or inconsistencies in the specifications. For example, the syntax may require that the output from a process at one level of system decomposition must include all output from a decomposition of that process at a more detailed level. Similarly, for each system output there should be a process that produces that output. Any deviation from these rules results in error diagnostics.

Each requirements analysis produces a representation of the system that indicates static relationships among system input, output, processes, and data. It also represents and analyzes dynamic relationships. This may be a precedence relationship (e.g., process A must execute before process B). It may also include information regarding how often a given process must execute to produce the volume of output required. This technique produces a detailed representation of relationships between different data items. This output can sometimes be used for developing a database for the system. Requirements analysis goes even further and provides a mechanism for simulating the requirements through the use of the generated system representation, including the performance and timing requirements.

Outline of Method. The user must provide the requirements specifications as input for the analysis. The analysis is automated; the user must interpret the results. Often the user can request selected types of output (e.g., an alphabetical list of all the processes or a list of all the data items of a given type). This analysis can be implemented either interactively or in a batch mode. Once the requirements specifications are considered acceptable, requirements analysis provides the capability for simulating the requirements. It is necessary that the data structure and data values generated from the requirements specifications be used as input to the simulation; otherwise, the simulation may not truly represent the requirements.

Example. PROCESS B produces two files named H2 and 1-13 from an input file named M2. PROCESS D accepts files H2 and H3 as input and produces files J3 and J6 as output. In addition, PROCESS G, a subprocess of PROCESS D, accepts file H3 as input and produces file J6. Then the following pseudo-specification statements might be used to describe the requirements. (These requirements are close to design, but this is often not the case.)

PROCESS B

USES FILE M2

PRODUCES FILES H2, H3

PROCESS D

USES FILES H2, H3

PRODUCES FILES J3, J6

PROCESS G

SUBPROCESS OF PROCESS D

USES FILE H3

PRODUCES FILE J6

The requirements specification implies a certain precedence of operation (e.g., PROCESS D cannot execute until PROCESS B has produced files H2 and H3). Detailed descriptions of what each process does would usually be included but are omitted here for brevity. Requirements analysis would probably generate a diagnostic because the statement for PROCESS D fails to indicate that it includes the subprocess G. A diagnostic would also be generated unless there are other statements that specify that file M2, needed by PROCESS 8, is available as an existing file or is produced by some other process. Similarly, other processes must be specified that use files J3 and J6 as input unless they are specified as files to be output from the system. Otherwise, additional diagnostics are generated. Some checks are similar to data flow analysis for a computer program. For large systems, however, the analysis of requirements becomes extremely complex if requirements for timing and performance are included and if timing and volume analysis are carried out. (Volume analysis is concerned with how often various processes must execute if the system is to accept or produce a specified volume of data in a given period of time.)

Effectiveness. Requirements analysis is effective for maintaining accurate requirements specifications. It is essential for large systems with many requirements. On the other hand, most existing requirements analysis tools are expensive to obtain and use, and they may not be cost-effective for development of small systems.

Applicability. Requirements analysis is applicable for developing most systems. It is particularly useful for analysis of requirements for large and complex systems.

Maturity. Requirements analysis is generally used on large, complex systems developed over time.

User Training. Requirements analysis generally requires considerable personnel training.

Costs. Most requirements analysis tools are expensive to obtain and use. They generally require a large amount of main storage and can therefore only be used on large computers.

REGRESSION TESTING

Regression testing is a technique that detects spurious errors caused by software modifications or corrections.

Information Input. For regression testing, a set of software test cases must be maintained and available throughout the entire life of the software. The test cases should be complete enough so that all of the software's functional capabilities are thoroughly tested. If available, acceptance tests should be used to form the base set of tests. In addition to the individual test cases themselves, detailed descriptions or samples of the actual output produced by each test case must also be supplied and maintained.

Information Output. The output from regression testing is simply the output produced by the software from the execution of each of the individual test cases.

Outline of Method. Regression testing is the process of retesting software to detect errors that may have been caused by program changes. The technique requires the use of a set of test cases that have been developed to test all of the software's functional capabilities. If an absolute determination of those portions of the software that can potentially be affected by a given change can be made, only those portions need to be tested. Associated with each test case is a description or sample of the correct output for that test case. When the tests have been executed, the actual output is compared with the expected output for correctness. As errors are detected during the actual operation of the software that were not detected by regression testing, a test case that could have uncovered the error should be constructed and included with the existing test cases.

Although not required, tools can be used to aid in performing regression testing. Automatic test harnesses can be used to assist the managing of test cases and in controlling the test execution. File comparators can often be useful in verifying actual output with expected output. Assertion processors are also useful in verifying the correctness of the output for a given test.

Example. An example of regression testing follows.

Application. A transaction processing system contains a dynamic data field editor that provides a variety of input/output field editing capabilities. Each transaction is composed of data fields as specified by a data element dictionary entry. A fixed identifier contained in a data field descriptor in the dictionary entry specifies the input and output edit routine used by each data field. When a transaction is entered, each field is edited by the appropriate input editor routine as specified in the dictionary entry. Output editing consists of using output editor routines to format the output.

Error. An input edit routine to edit numeric data fields was modified to perform a fairly restrictive range check needed by a particular transaction program. Current system documentation indicated that that single transaction program was only using this particular edit routine. The documentation, however, was not up to date; another, highly critical, transaction program also used the routine, often with data falling outside the range check needed by the other program.

Error Discovery. Regression testing would uncover the error if a sufficient set of functional tests were used for performing the testing. If only the transaction program for which the modification was made were tested, the error would not have been discovered until actual operation.

Effectiveness. The effectiveness of the technique depends on the quality of the data used. If functional testing is used to create the test data, the technique is highly effective. Although the burden and expense associated with the technique may seem prohibitive (particularly for small changes), it is often quite straightforward to determine which functions can be potentially affected by a given change. In such cases, the extent of the testing can be reduced to a more manageable size.

Applicability. This method is applicable during the unit and module test phases.

Maturity. Regression testing is a mature method and is widely used.

User Training. No special training is required to apply this technique. If tools are used in support of regression testing, however, knowledge of their use is required. Moreover, successful application of the technique requires establishment of procedures and the management control necessary to ensure adherence to those procedures.

Costs. Because testing is required for software modifications anyway, regression testing incurs no additional effort (assuming that only the necessary functional capabilities are retested). The use of tools may increase the cost, but it also increases effectiveness.

Thursday, December 01, 2011

TestTechniques

No comments: