python unit-testing multidimensional-array random

Proper way to use Python Unittest on a random 2D array

Suppose i have a function that returns a 3 by 3, 2d array with random entries in a given bound:

def random3by3Matrix(smallest_num, largest_num):
    matrix = [[0 for x in range(3)] for y in range(3)]
    for i in range(3):
        for j in range(3):
            matrix[i][j] = random.randrange(int(smallest_num),
                                            int(largest_num + 1)) if smallest_num != largest_num else smallest_num
    return matrix

print(random3by3Matrix(-10, 10))

Code above returns something like this:

[[-6, 10, -4], [-10, -9, 8], [10, 1, 1]]

How would I write a unittest for a function like this? I thought of using a helper function:

def isEveryEntryGreaterEqual(list1, list2):
    for i in range(len(list1)):
        for j in range(len(list1[0])):
            if not (list1[i][j] <= list2[i][j]):
                return False
    return True


class TestFunction(unittest.TestCase):
    def test_random3by3Matrix(self):
        lower_bound = [[-10 for x in range(3)] for y in range(3)]
        upper_bound = [[10 for x in range(3)] for y in range(3)]

        self.assertEqual(True, isEveryEntryGreaterEqual(lower_bound, random3by3Matrix(-10,10)))
        self.assertEqual(True, isEveryEntryGreaterEqual(random3by3Matrix(-10,10), upper_bound))

But is there a cleaner way to do this? Furthermore, how would you test that all of your values are not only between the boundaries, but also distributet randomly?

Solution

Test matrix bounds

It looks like you want to test if every single element in the matrix is greater than some value, independently of where in the matrix this element is. You can make this code shorter and more readable by e.g. extracting all the elements from the matrix and checking them all in one go, instead of the double for loop. You can easily transform any array-nesting with numpy.flatten() to a 1D array, and then test the resulting 1D array in one go with python's built-in all() method. This way, you can avoid looping over all the elements yourself:

import numpy as np
def is_matrix_in_bounds(matrix, low, high):
    flat_list = np.flatten(matrix) # create a 1D list
    # Each element is a boolean, that is True if it's within bounds
    in_bounds = [low <= e <= high for e in flat_list]
    # all() returns True if each element in in_bounds is 'True
    # returns False as soon as a single element in in_bounds is False
    return all(in_bounds)

class TestFunction(unittest.TestCase):
    def test_random3by3Matrix(self):
        lower_bound = -10
        upper_bound = 10
        matrix = random3by3Matrix(-10,10)
        self.assertEqual(True, is_matrix_in_bounds(matrix, lower_bound, upper_bound))

If you will be using things like the matrix and the bounds in multiple tests, it may be beneficial to make them class attributes, so you don't have to define them in each test function.

Test matrix randomness

Testing if some matrix is truly randomly distributed is a bit harder, since it will involve a statistical test to check if the variables are randomly distributed or not. The best you can do here is calculate the odds that they are indeed randomly distributed, and put a threshold on how low these odds are allowed to be. Since the matrix is random and the values in the matrix do not depend on each other, you're in luck, because you can again test them as if they were a 1D distribution.

To test this, you should create a second random uniform distribution, and test the goodness of fit between your matrix and the new distribution with a Kolmogorov-Smirnov test. This considers the two distributions as random samples, and tests how likely it is that they were drawn from the same underlying distribution. In your case: a random uniform distribution. If the distributions are vastly different, it will have a very low p-value (i.e. the odds of these distributions being drawn from the same underlying distribution is low). If they are similar, the p-value will be high. You want a random matrix, so you want a high p-value. The usual cutoff for this is 0.05 (which means that 1/20 distributions will be considered non-random, because they look kinda non-random by happenstance). Python provides such a test with the scipy module. Here, you can either pass two samples (a two-sample ks test), or pass the name of some distribution and specify the parameters (a one-sample ks test). For the latter case, the distribution name should be the name of a distribution in scipy.stats, and you can pass the arguments to create such a distribution via the keyword args=().

import numpy as np
from scipy import stats
def test_matrix_randomness(matrix, low, high):
    lower_bound = -10
    upper_bound = 10
    matrix = random3by3matrix(-10, 10)
    # two-sample test
    random_dist = np.random.random_integers(low=lower_bound, high=upper_bound, size=3*3)
    statistic, p_value = stats.kstest(random_dist, np.flatten(matrix))
    # one-sample test, equivalent, but neater
    # doesn't require creating a second distribution
    statistic, p_value = stats.kstest(random_dist, "randint", args=(-10, 10))
    self.assertEqual(True, p_value > 0.05)

Note that unittests with a random aspect will sometimes fail. Such is the nature of randomness.

see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html#scipy.stats.kstest https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.randint.html#scipy.stats.randint