Looking at ways to create data sets for testing user-defined data types using pseudorandom data generation techniques.
This article is for programmers wanting to validate user-defined data types and memory allocation using huge quantities of data, in a simple and easy manner. The 3 areas of testing which we are interested in are:
In Stress Testing our aim is to fill the system to breaking point. The Data Integrity tests are designed to check that we get out of the system that which we put in. The Borderline Cases are designed to test the limits of individual components of the system.
A pseudorandom number sequence is one which has no discernible pattern, but which for the same seed, is always the same. The stdlib.h library has two functions useful for generating pseudorandom number sequences:
To use these to generate (for example) 100 random numbers between 0 and 99, we would use code such as:
The above will generate a different sequence each time it is run (within reason). However, if we change the time (NULL) argument to a number, then we can generate an identical sequence every time:
This behavior is vital in creating a test harness.
If we assume that we want to test a simple product database application, we might have a stock entry as follows:
For stress testing, we want to allocate memory for the structure until no more can be allocated, and then unwind the data. We also want to check the data integrity while we do it. One way we can do this is in a small program designed to test the user-defined stucture. We assume that we have defined functions to perform the following:
For each item that we allocate memory for, we want to:
In order to perform the test, we need to do this in 3 stages:
Since we are using pseudorandom techniques, all we need to do is seed the random number generator with the sequence number of the current item, generate the data, and store it. When we want to check the data, we can just re-seed the random number generator and check the stored data against it. This is easily achieved:
All that remains is to create the GenerateItemData function, which populates the item_data structures. That is left as an exercise for the reader, and involves multiple calls to rand(), but without calling srand until the next item needs to be populated.
This has given the reader a glimpse of what is possible, but it is only the tip of the iceberg. We can also create test data sets with a program, supplied to multiple software providers, so that they can test their implementations against a set of known data. The advantage is clear: the more time we spend testing, the better the implementation will be. Creating test data using pseudorandom number sequences is also an effective leverage of programmer time, as it embodies an automated approach to testing known good data.