Search code examples
pythonpython-multiprocessing

Python multiprocessing pool creating duplicate lists


I'm trying to figure out multiprocessing and I've run into something I entirely don't understand.

I'm using pathos.multiprocessing for better pickling. The following code creates a list of objects which I want to iterate through. However, when I run it, it prints several different lists despite referring to the same variable?

import os
from pathos.multiprocessing import ProcessPool as Pool


class AnyClass:
    def __init__(self):
    pass


def any_function():
    any_list = []

for i in range(0, 3):
    any_object = AnyClass()
    any_list.append(any_object)

def particular_function(_argument):
    print(any_list)

with Pool(os.cpu_count()-1) as pool:
    pool.map(particular_function, any_list)

print(any_list)


if __name__ == '__main__':
    any_function()

The output looks like this, with a different list each time.

[<__main__.AnyClass object at 0x7ff03da8ffd0>, <__main__.AnyClass object at 0x7ff03da9c040>, <__main__.AnyClass object at 0x7ff03da9c070>]
[<__main__.AnyClass object at 0x7ff03da9c100>, <__main__.AnyClass object at 0x7ff03da9c130>, <__main__.AnyClass object at 0x7ff03da9c160>]
[<__main__.AnyClass object at 0x7ff03da9c1f0>, <__main__.AnyClass object at 0x7ff03da9c220>, <__main__.AnyClass object at 0x7ff03da9c250>]
[<__main__.AnyClass object at 0x7ff03ac6a4f0>, <__main__.AnyClass object at 0x7ff03ac9ad60>, <__main__.AnyClass object at 0x7ff03da57af0>]

I'm sorry if this is a poor explanation or bad question as I'm quite new to python, however is there any way to fix this; ie have the same list every time?


Solution

  • When using multiprocessing, the library spawns multiple different processes. Each process has its own address space. This means that each of those processes has its own copy of the variable, and any change in one process will not reflect in other processes.

    In order to use shared memory, you need special constructs to define your global variables. For pathos.multiprocessing, from this comment, it seems you can declare multiprocessing type shared variables by simply importing the following:

    from pathos.helpers import mp as multiprocess
    a = multiprocess.Array('i', 2)  # Declares an integer array of size 2