As I understand the Python process pool does not share memory (see this question). Instead the processes create a copy of the global variable. I am confused by the results of the following program
import multiprocessing as mp
def worker_function(item):
x.append(item)
print(x)
if __name__ == '__main__':
x = []
pool = mp.Pool(16)
jobs = []
for item in range(10):
job = pool.apply_async(worker_function, (item,))
jobs.append(job)
for job in jobs:
job.get()
pool.close()
pool.join()
print(f"x at the end of pool execution: {x}")
The output of this program is something like this
[0]
[0, 2]
[1]
[3]
[4]
[5]
[6]
[7]
[6, 8]
[9]
x at the end of pool execution: []
My interpretation is that the multiprocessing library creates the processes once and copies the variables only once at the beginning of the creation. If two functions run on a pool, they will share the global variables with between themselves and any update that the first running function had on the global variables will be consumed by the second function.
Can someone confirm if my understanding is correct?
You created 16 processes, but that doesn't mean that each task will run on its own process. Some processes may be slow to start up, and since your tasks are so tiny, an already running process may grab a second task.
In your code, each process, not each task, has its own x
. So if multiple tasks run on the same process, they will both modify the same x
. Hence your results.