Search code examples
pythonnumpyrandomlcg

Linear congruential generator - weak tests results


I'm trying to do the Dieharder Suite tests on my linear congruential generator.

I am not sure if the tests are performed on my generator or simply the results are so weak.

I generate 2,5 mio lines with the generator save them to file testrands.txt with this header:

#==================================================================
# generator lcg  seed = 1
#==================================================================
type: d
count: 100000
numbit: 32
1015568748
1586005467
2165703038
3027450565
217083232
1587069247
......

I followed this instruction (as in EXAMPLE)

Then I used the Dieharder suite to carry out the tests in the following way:

dieharder -g 202 -f testrands.txt -a

now the results are surprisingly weak (maybe I have generated too few numbers?)

1

2

I do as it is in the guide but still seems to be something not as it should - LCG passes a birthday-spacing (I think it should not) and the remaining results are surprisingly weak

My LCG:

import numpy as np

class LCG(object):

    UZERO: np.uint32 = np.uint32(0)
    UONE : np.uint32 = np.uint32(1)

    def __init__(self, seed: np.uint32, a: np.uint32, c: np.uint32) -> None:
        self._seed: np.uint32 = np.uint32(seed)
        self._a   : np.uint32 = np.uint32(a)
        self._c   : np.uint32 = np.uint32(c)

    def next(self) -> np.uint32:
        self._seed = self._a * self._seed + self._c
        return self._seed

    def seed(self) -> np.uint32:
        return self._seed

    def set_seed(self, seed: np.uint32) -> np.uint32:
        self._seed = seed

    def skip(self, ns: np.int32) -> None:
        """
        Signed argument - skip forward as well as backward

        The algorithm here to determine the parameters used to skip ahead is
        described in the paper F. Brown, "Random Number Generation with Arbitrary Stride,"
        Trans. Am. Nucl. Soc. (Nov. 1994). This algorithm is able to skip ahead in
        O(log2(N)) operations instead of O(N). It computes parameters
        A and C which can then be used to find x_N = A*x_0 + C mod 2^M.
        """

        nskip: np.uint32 = np.uint32(ns)

        a: np.uint32 = self._a
        c: np.uint32 = self._c

        a_next: np.uint32 = LCG.UONE
        c_next: np.uint32 = LCG.UZERO

        while nskip > LCG.UZERO:
            if (nskip & LCG.UONE) != LCG.UZERO:
                a_next = a_next * a
                c_next = c_next * a + c

            c = (a + LCG.UONE) * c
            a = a * a

            nskip = nskip >> LCG.UONE

        self._seed = a_next * self._seed + c_next


#%%
np.seterr(over='ignore')

a = np.uint32(1664525)
c = np.uint32(1013904223)
seed = np.uint32(1)

rng = LCG(seed, a, c)
q = [rng.next() for _ in range(0, 2500000)]

Solution

  • Congratulations, you've shown that your LCG implementation isn't competitive with more modern generators.

    Please note that the birthday spacings test is not the same thing as the birthday test. It is possible for an LCG to pass the former, but any full-cycle k-bit generator which reports its full k-bit state will always fail the latter at an α ≤ 0.01 level for any sample size n ≥ 3*2k/2. (For your 32-bit implementation, that's a sample size of 3*216 = 196608.) There won't be any duplicate values if it's a full-cycle generator, and the "birthday paradox" tells us there ought to be at least one with p ≥ 0.99 if the stream of values is truly random.