Search code examples
pythonrandompython-multiprocessing

Multiprocessing, pooling and randomness


I am experiencing a strange thing: I wrote a program to simulate economies. Instead of running this simulation one by one on one CPU core, I want to use multiprocessing to make things faster. So I run my code (fine), and I want to get some stats from the simulations I am doing. Then arises one surprise: all the simulations done at the same time yield the very same result! Is there some strange relationship between Pool() and random.seed()?

To be much clearer, here is what the code can be summarized as:

class Economy(object):
    def __init__(self,i):
        self.run_number = i
        self.Statistics = Statistics()
        self.process()

def run_and_return(i):
    eco = Economy(i)
    return eco

collection = []
def get_result(x):
    collection.append(x)

if __name__ == '__main__':
    pool = Pool(processes=4)
    for i in range(NRUN):
        pool.apply_async(run_and_return, (i,), callback=get_result)
    pool.close()
    pool.join()

The process(i) is the function that goes through every step of the simulation, during i steps. Basically I simulate NRUN Economies, from which I get the Statistics that I put in the list collection.

Now the strange thing is that the output of this is exactly the same for the first 4 runs: during the same "wave" of simulations, I get the very same output. Once I get to the second wave, then I get a different output for the next 4 simulations!

All these simulations run well if I use the same program with processes=1: I get different results when I only work on one core, taking simulations one by one... I have tried a few things, but can't get my head around this, hence my post...

Thank you very much for taking the time to read this long post, do not hesitate to ask for more precisions!

All the best,


Solution

  • If you are on Linux then each pool process is made by forking the parent process. This means the process is literally duplicated - this includes the seed any random object may be using.

    The random module selects the seed for its default functions on import. Meaning the seed has already been selected before you create the Pool.

    To get around this you must use an initialiser for each pool process that sets the random seed to something unique.

    A decent way to seed random would be to use the process id and the current time. The process id is bound to be unique on a single run of your program. Whilst using the time will ensure uniqueness over multiple runs in case the same process id is produced. Passing process id and time through as a string will mean that the digest of the string is also used to seed the random number generator -- meaning two similar strings will produce substantially different seeds. Alternatively, you could use the uuid module to generate seeds.

    def proc_init():
        random.seed(str(os.getpid()) + str(time.time()))
    
    pool = Pool(num_procs, initializer=proc_init)