Testing with Pseudorandom Data

Building Test Harnesses with Known Good Data for UDT Testing

© Guy Lecky-Thompson

Looking at ways to create data sets for testing user-defined data types using pseudorandom data generation techniques.

Introduction

This article is for programmers wanting to validate user-defined data types and memory allocation using huge quantities of data, in a simple and easy manner. The 3 areas of testing which we are interested in are:

In Stress Testing our aim is to fill the system to breaking point. The Data Integrity tests are designed to check that we get out of the system that which we put in. The Borderline Cases are designed to test the limits of individual components of the system.

Pseudorandom Numbers in C/C++

A pseudorandom number sequence is one which has no discernible pattern, but which for the same seed, is always the same. The stdlib.h library has two functions useful for generating pseudorandom number sequences:

srand( long lSeed )
rand()

To use these to generate (for example) 100 random numbers between 0 and 99, we would use code such as:

// Ranomize on current time
srand ( time (NULL) );
for (i = 0; i < 100; i++)
{
nNum = rand() % 100;
// Do something with nNum
}

The above will generate a different sequence each time it is run (within reason). However, if we change the time (NULL) argument to a number, then we can generate an identical sequence every time:

srand ( 42 );
for (i = 0; i < 100; i++)
{
nNum = rand() % 100;
// Do something with nNum
}

This behavior is vital in creating a test harness.

Creating the Test Harness

If we assume that we want to test a simple product database application, we might have a stock entry as follows:

struct stock_entry
{
char szProduct[25];
float fPrice;
long lStock;
};

For stress testing, we want to allocate memory for the structure until no more can be allocated, and then unwind the data. We also want to check the data integrity while we do it. One way we can do this is in a small program designed to test the user-defined stucture. We assume that we have defined functions to perform the following:

// Returns pointer to new stock entry in memory, or null when out of memory
stock_entry * NewStockEntry( stock_entry * heap );
// Remove the tail of the heap of stock entries, return new last or null
stock_entry * DeleteLastStockEntry( stock_entry * heap );
// Get/Set the contents of a stock entry
stock_entry * GetLastStockEntry( stock_entry * heap );
void SetStockData( stock_entry * item, stock_entry data );
// Comparison
int CompareStockData( stock_entry * item, stock_entry data);

For each item that we allocate memory for, we want to:

In order to perform the test, we need to do this in 3 stages:

  1. Allocate as many items as possible
  2. Check the data integrity
  3. Deallocate the items

Since we are using pseudorandom techniques, all we need to do is seed the random number generator with the sequence number of the current item, generate the data, and store it. When we want to check the data, we can just re-seed the random number generator and check the stored data against it. This is easily achieved:

// Allocate and populate
long itemID = 0;
while ( (stock_entry * new_item = NewStockEntry ( oHeap ) ) != NULL )
{
srand ( itemID );
GenerateItemData ( new_item );
itemID++;
}
itemID--;
// Deallocate and check
while ( itemID = 0 )
{
stock_entry * last_entry = GetLastStockEntry ( oHeap );
stock_entry check_entry;
GenerateItemData ( &check_entry );
if ( CompareStockData ( last_entry, check_entry ) != 0)
{
// Raise some sort of error
}
}

All that remains is to create the GenerateItemData function, which populates the item_data structures. That is left as an exercise for the reader, and involves multiple calls to rand(), but without calling srand until the next item needs to be populated.

Going Forward

This has given the reader a glimpse of what is possible, but it is only the tip of the iceberg. We can also create test data sets with a program, supplied to multiple software providers, so that they can test their implementations against a set of known data. The advantage is clear: the more time we spend testing, the better the implementation will be. Creating test data using pseudorandom number sequences is also an effective leverage of programmer time, as it embodies an automated approach to testing known good data.


The copyright of the article Testing with Pseudorandom Data in Computer Programming Tutorials is owned by Guy Lecky-Thompson. Permission to republish Testing with Pseudorandom Data must be granted by the author in writing.




Post this Article to facebook Add this Article to del.icio.us! Digg this Article furl this Article Add this Article to Reddit Add this Article to Technorati Add this Article to Newsvine Add this Article to Windows Live Add this Article to Yahoo Add this Article to StumbleUpon Add this Article to BlinkLists Add this Article to Spurl Add this Article to Google Add this Article to Ask Add this Article to Squidoo