Search code examples
pythonpython-2.7python-3.xshufflerandom-seed

Difference between Python 2 and 3 for shuffle with a given seed


I am writing a program compatible with both Python 2.7 and 3.5. Some parts of it rely on stochastic process. My unit tests use an arbitrary seed, which leads to the same results across executions and languages... except for the code using random.shuffle.

Example in Python 2.7:

In[]:   import random
        random.seed(42)
        print(random.random())
        l = list(range(20))
        random.shuffle(l)
        print(l)
Out[]:  0.639426798458
        [6, 8, 9, 15, 7, 3, 17, 14, 11, 16, 2, 19, 18, 1, 13, 10, 12, 4, 5, 0]

Same input in Python 3.5:

In []:  import random
        random.seed(42)
        print(random.random())
        l = list(range(20))
        random.shuffle(l)
        print(l)
Out[]:  0.6394267984578837
        [3, 5, 2, 15, 9, 12, 16, 19, 6, 13, 18, 14, 10, 1, 11, 4, 17, 7, 8, 0]

Note that the pseudo-random number is the same, but the shuffled lists are different. As expected, reexecuting the cells does not change their respective output.

How could I write the same test code for the two versions of Python?


Solution

  • In Python 3.2 the random module was refactored a little to make the output uniform across architectures (given the same seed), see issue #7889. The shuffle() method was switched to using Random._randbelow().

    However, the _randbelow() method was also adjusted, so simply copying the 3.5 version of shuffle() is not enough to fix this.

    That said, if you pass in your own random() function, the implementation in Python 3.5 is unchanged from the 2.7 version, and thus lets you bypass this limitation:

    random.shuffle(l, random.random)
    

    Note however, than now you are subject to the old 32-bit vs 64-bit architecture differences that #7889 tried to solve.

    Ignoring several optimisations and special cases, if you include _randbelow() the 3.5 version can be backported as:

    import random
    import sys
    
    if sys.version_info >= (3, 2):
        newshuffle = random.shuffle
    else:
        try:
            xrange
        except NameError:
            xrange = range
    
        def newshuffle(x):
            def _randbelow(n):
                "Return a random int in the range [0,n).  Raises ValueError if n==0."
                getrandbits = random.getrandbits
                k = n.bit_length()  # don't use (n-1) here because n can be 1
                r = getrandbits(k)          # 0 <= r < 2**k
                while r >= n:
                    r = getrandbits(k)
                return r
    
            for i in xrange(len(x) - 1, 0, -1):
                # pick an element in x[:i+1] with which to exchange x[i]
                j = _randbelow(i+1)
                x[i], x[j] = x[j], x[i]
    

    which gives you the same output on 2.7 as 3.5:

    >>> random.seed(42)
    >>> print(random.random())
    0.639426798458
    >>> l = list(range(20))
    >>> newshuffle(l)
    >>> print(l)
    [3, 5, 2, 15, 9, 12, 16, 19, 6, 13, 18, 14, 10, 1, 11, 4, 17, 7, 8, 0]