Search code examples
pythonpython-2.7randomsetnon-deterministic

Seeded Python RNG showing non-deterministic behavior with sets


I'm seeing non-deterministic behavior when trying to select a pseudo-random element from sets, even though the RNG is seeded (example code shown below). Why is this happening, and should I expect other Python data types to show similar behavior?

Notes: I've only tested this on Python 2.7, but it's been reproducible on two different Windows computers.

Similar Issue: The issue at Python random seed not working with Genetic Programming example code may be similar. Based on my testing, my hypothesis is that run-to-run memory allocation differences within the sets is leading to different elements getting picked up for the same RNG state.

So far I haven't found any mention of this kind of caveat/issue in the Python docs for set or random.

Example Code (randTest produces different output run-to-run):

import random

''' Class contains a large set of pseudo-random numbers. '''
class bigSet:
    def __init__(self):
        self.a = set()
        for n in range(2000):
            self.a.add(random.random())
        return


''' Main test function. '''
def randTest():
    ''' Seed the PRNG. '''
    random.seed(0)

    ''' Create sets of bigSet elements, presumably many memory allocations. ''' 
    b = set()
    for n in range (2000):
        b.add(bigSet())

    ''' Pick a random value from a random bigSet. Would have expected this to be deterministic. '''    
    c = random.sample(b,1)[0]
    print('randVal: ' + str(random.random()))           #This value is always the same
    print('setSample: ' + str(random.sample(c.a,1)[0])) #This value can change run-to-run
    return

Solution

  • It has to do with object instantiation of mutable objects. if I create a set of frozenset it does give a deterministic result;

    Python 2.7.11 (default, Jan  9 2016, 15:47:04) 
    [GCC 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-final 208032)] on freebsd10
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import random
    >>> random.seed(0)
    >>> set(frozenset(random.random() for i in range(5)) for j in range(5))
    set([frozenset([0.7298317482601286, 0.3101475693193326, 0.8988382879679935, 0.47214271545271336, 0.6839839319154413]), frozenset([0.5833820394550312, 0.4765969541523558, 0.4049341374504143, 0.30331272607892745, 0.7837985890347726]), frozenset([0.7558042041572239, 0.5046868558173903, 0.9081128851953352, 0.28183784439970383, 0.6183689966753316]), frozenset([0.420571580830845, 0.25891675029296335, 0.7579544029403025, 0.8444218515250481, 0.5112747213686085]), frozenset([0.9097462559682401, 0.8102172359965896, 0.9021659504395827, 0.9827854760376531, 0.25050634136244054])])
    >>> random.seed(0)
    >>> set(frozenset(random.random() for i in range(5)) for j in range(5))
    set([frozenset([0.7298317482601286, 0.3101475693193326, 0.8988382879679935, 0.47214271545271336, 0.6839839319154413]), frozenset([0.5833820394550312, 0.4765969541523558, 0.4049341374504143, 0.30331272607892745, 0.7837985890347726]), frozenset([0.7558042041572239, 0.5046868558173903, 0.9081128851953352, 0.28183784439970383, 0.6183689966753316]), frozenset([0.420571580830845, 0.25891675029296335, 0.7579544029403025, 0.8444218515250481, 0.5112747213686085]), frozenset([0.9097462559682401, 0.8102172359965896, 0.9021659504395827, 0.9827854760376531, 0.25050634136244054])])
    >>> 
    

    If I'm not mistaken, CPython uses a (mutable) object's memory location as it's id and as the key for hashing.

    So while the contents of the objects will always be the same, it's id will be different;

    In [13]: random.seed(0)
    
    In [14]: k = set()
    
    In [15]: for n in range (20):
        k.add(bigSet())
       ....:     
    
    In [16]: for x in k:
        print(id(x))
       ....:     
    34856629808
    34856629864
    34856631936
    34856630424
    34856629920
    34856631992
    34856630480
    34856629976
    34856632048
    34856631040
    34856630536
    34856632104
    34856630032
    34856630592
    34856630088
    34856632160
    34856629752
    34856629696
    34856630760
    34856630256
    
    In [17]: random.seed(0)
    
    In [18]: k = set()
    
    In [19]: for n in range (20):
       ....:         k.add(bigSet())
       ....:     
    
    In [20]: for x in k:
       ....:         print(id(x))
       ....:     
    34484534800
    34856629808
    34484534856
    34856629864
    34856631936
    34856630424
    34856629920
    34856631992
    34484534968
    34856629976
    34856630480
    34856632048
    34856631040
    34484535024
    34484535080
    34484535136
    34856632216
    34484534688
    34484534912
    34484534744
    

    A possible solution would be to subclass a frozen set.