AmvTek blog

complex web systems

Making good use of random in your python unit tests

Writing efficient UnitTest to validate that your code performs as expected is a difficult endeavor. Lot has been written about the benefits of Test Driven development and on how to best approach testing, and lot can be learned reading the available litterature. One thing however that we don’t see often mentionned is that architecting efficient UnitTest is pretty hard and that no tools or testing framework are of much value without a fair understanding of the code base that needs to be tested.

The techniques we will be briefly introducing now are no different. You may use them to impress your colleagues and show them TestSuite you have just written that contains millions of tests. Be aware though that increasing TestSuite test count may not be sufficient to meaningfully change code base coverage.

Basic idea

Assumes you wish to provide tests for an hypothetical func_to_test that looks like so :

def func_to_test(x, y):
    return result

To proceed with unit testing func_to_test, our first goal is to generate values that optimally covers the expected domain.

Assumes that x and y are float numbers varying in between [xmin, xmax] and [ymin, ymax].

You may generate a range of values for calling func_to_test like so :

import math, itertools

def float_range(vmin, vmax, n):
    "yield n regularily spaced values in between vmin and vmax..."

    s = float(vmax-vmin)/n

    v = vmin
    for i in xrange(n):

        yield v
        v += s

def gen_func_to_test_sample(m):
    "yield at least m tuples covering func_to_test domain..."

    # calculate optimal number of values along each axis
    n = int(math.ceil(math.sqrt(m)))

    # define value range for x and y
    rx = float_range(xmin, xmax, n)
    ry = float_range(ymin, ymax, n)

    # yield regularily spaced tuples covering func_to_test domain
    for t in itertools.product(rx, ry):
        yield t

In this simple case, it would be simpler to use 2 nested loop to generate the values covering func_to_test domain. However if func_to_test number of axis is large, itertools.product allows to keep things manageable.

The basic idea of randomization consists in covering the problem space with randomly generated values. Randomization has 2 benefits over previous approach :

  • The code to generate values over the problem domain is much simpler.
  • Test values being irregularily spaced you will not be trapped by singularity.

To generate a random range of values for calling func_to_test you may proceed like so :

import random

def gen_func_to_test_random_sample(m):

    for i in xrange(m):

        yield random.uniform(xmin,xmax), random.uniform(ymin,ymax)

In case you are not familiar with standard library random module we invit you to explore it as it has lot of features to help generating objects covering complex domain…

Be repeatable

By now you shall have understood the basic idea of tests randomization pretty well. What we want is to cover the problem space in an efficient way minimizing the risks of being trapped by singularities…

There is one big problem though with the approach that we take, is that tests suite shall be repeatable. Imagine that one developer reports that he has observed failure of test 100. If test 100 can never be rerun as is our randomized tests suite will generate more confusion than value.

Fortunately, the Mersenne Twister random generator exported by the standard library random module can be initialized so that same random sequences are generated. Let’s modify our sample generator to make use of this :

from random import Random

def gen_func_to_test_random_sample(seed,m):

    random = Random((seed,m))

    for i in xrange(m):

        yield random.uniform(xmin,xmax), random.uniform(ymin,ymax)

We use a dedicated instance of Random to prevent interfering with other thread which may also be in need of random values at the very same moment we are generating the test sequence.

Using same seed value for each run of the tests suites allows to guarantee that same sample sequence will be generated…

TestCase factories

As you have written tests before, by now you shall be asking yourself how to use this large sequence of (random) objects which you have been advised to generate.

The obvious approach would be to write a single test method that iterates over the sample sequence and apply desired assertions on func_to_test results. We advise you against doing so as your test function will prospectively be in need to apply a very large number of assertions and exit without continuing at the first encountered problem.

Instead you can use a factory function which will take care of generating your TestCase like so :

"Your test module"

import unittest
from random import Random

from somewhere import func_to_test

XDOMAIN = (0.0,8.0) # example (xmin,xmax)
YDOMAIN = (2.0,6.0) # example (ymin,ymax)

def gen_func_to_test_random_sample(seed,m):
    "yield random point over func_to_test domain..."

    random = Random((seed,m))

    for i in xrange(m):

        yield random.uniform(*XDOMAIN), random.uniform(*YDOMAIN)

def build_TestFuncTestCase(seed,m):
    "return TestCase class for func_to_test..."

    # test method factory
    def make_test_method(test_point):
        "return func_to_test test..."

        def a_test(self):

            result = func_to_test(*test_point)

            # all your asserts here, see unittest.TestCase documentation...

         return a_test

    # fill TestCase dict
    count = 0
    dico = {}
    for pt in gen_func_to_test_random_sample(seed,m):

        testname = "test_func_to_test_%i" % count
        dico[testname] = make_test_method(pt)
        count += 1

    return type("TestFuncTestCase",(unittest.TestCase,),dico)

# this TestCase class will be picked up by Test Runner
# it will contain 1024 tests...
TestFuncTestCase = build_TestFuncTestCase("my test suite",1024)

It is our experience that randomization when applicable provides an efficient way forward to unit test your module. This approach can be summarized like this :

  1. Write code that generate repeatable (pseudo random) sequence of objects over your problem domain.
  2. Use a factory function to generate TestCase subclasses with one test method for each object in your test sequence.