python python-3.x multiprocessing subprocess python-multiprocessing

How are parent process global variables copied to sub-processes in python multiprocessing

Ubuntu 20.04

My understanding of global variable access by different sub-processes in python is this:

Global variables (let's say b) are available to each sub-process in a copy-on-write capacity
If a sub-process modifies that variable then a copy of b is first created and then that copy is modified. This change would not be visible to the parent process (I will ask a question on this part later)

I did a few experiments trying to understand when the object is getting copied. I could not conclude much:

Experiments:

import numpy as np
import multiprocessing as mp
import psutil
b=np.arange(200000000).reshape(-1,100).astype(np.float64)

Then I tried to see how the memory consumption changes using the below-mentioned function:

def f2():
    print(psutil.virtual_memory().used/(1024*1024*1024))
    global b
    print(psutil.virtual_memory().used/(1024*1024*1024))
    b = b + 1 ### I changed this statement to study the different memory behaviors. I am posting the results for different statements in place of b = b + 1.
    print(psutil.virtual_memory().used/(1024*1024*1024))

p2 = mp.Process(target=f2)
p2.start()
p2.join()

Results format:

statement used in place of b = b + 1
print 1
print 2
print 3
Comments and questions

Results:

b = b+1
6.571144104003906
6.57244873046875
8.082862854003906 
Only a copy-on-write view was provided so no memory consumption till it hit b = b+1. At which point a copy of b was created and hence the memory usage spike

b[:, 1] = b[:, 1] + 1
6.6118621826171875
6.613414764404297
8.108139038085938
Only a copy-on-write view was provided so no memory consumption till it hit b[:, 1] = b[:, 1] + 1. It seems that even if some part of the memory is to be updated (here just one column) the entire object would be copied. Seems fair (so far)

b[0, :] = b[0, :] + 1
6.580562591552734
6.581851959228516
6.582511901855469
NO MEMORY CHANGE! When I tried to modify a column it copied the entire b. But when I try to modify a row, it does not create a copy? Can you please explain what happened here?


b[0:100000, :] = b[0:100000, :] + 1
6.572498321533203
6.5740814208984375
6.656215667724609
Slight memory spike. Assuming a partial copy since I modified just the first 1/20th of the rows. But that would mean that while modifying a column as well some partial copy should have been created, unlike the full copy that we saw in case 2 above. No? Can you please explain what happened here as well?

b[0:500000, :] = b[0:500000, :] + 1
6.593017578125
6.594577789306641
6.970676422119141
The assumption of partial copy was right I think. A moderate memory spike to reflect the change in 1/4th of the total rows

b[0:1000000, :] = b[0:1000000, :] + 1
6.570674896240234
6.5723876953125
7.318485260009766
In-line with partial copy hypothesis


b[0:2000000, :] = b[0:2000000, :] + 1
6.594249725341797
6.596080780029297
8.087333679199219
A full copy since now we are modifying the entire array. This is equal to b = b + 1 only. Just that we have now referred using a slice of all the rows

b[0:2000000, 1] = b[0:2000000, 1] + 1
6.564876556396484
6.566963195800781
8.069766998291016
Again full copy. It seems in the case of row slices a partial copy is getting created and in the case of a column slice, a full copy is getting created which, is weird to me. Can you please help me understand what the exact copy semantics of global variables of a child process are?

As you can see I am not finding a way to justify the results that I am seeing up in the experiment setup I described. Can you please help me understand how global variables of the parent process are copied upon full/partial modifications by the child process?

I have also read that:

The child gets a copy-on-write view of the parent memory space. As long as you load the dataset before firing the processes and you don't pass a reference to that memory space in the multiprocessing call (that is, workers should use the global variable directly), then there is no copy.

Question 1: What does "As long as you load the dataset before firing the processes and you don't pass a reference to that memory space in the multiprocessing call (that is, workers should use the global variable directly), then there is no copy" mean?

As answered by Mr. Tim Roberts below, it means -

If you pass the dataset as a parameter, then Python has to make a copy to transfer it over. The parameter passing mechanism doesn't use copy-on-write, partly because the reference counting stuff would be confused. When you create it as a global before things start, there's a solid reference, so the multiprocessing code can make copy-on-write happen.

However, I am not able to verify this behavior. Here are the few tests I ran to verify

import numpy as np
import multiprocessing as mp
import psutil
b=np.arange(200000000).reshape(-1,100).astype(np.float64)

Then I tried to see how the memory consumption changes using the below-mentioned function:

def f2(b): ### Please notice that the array is passed as an argument and not picked as the global variable of parent process
    print(psutil.virtual_memory().used/(1024*1024*1024))
    b = b + 1 ### I changed this statement to study the different memory behaviors. I am posting the results for different statements in place of b = b + 1.
    print(psutil.virtual_memory().used/(1024*1024*1024))

print(psutil.virtual_memory().used/(1024*1024*1024))
p2 = mp.Process(target=f2,args=(b,)) ### Please notice that the array is passed as an argument and not picked as the global variable of parent process
p2.start()
p2.join()

Results format: same as above

Results:

b = b+1
6.692680358886719
6.69635009765625
8.189273834228516
The second print is arising from within the function hence, by then the copy should have been made and we should see the second print to be around 8.18

b = b
6.699306488037109
6.701808929443359
6.702671051025391
The second and third print should have been around 8.18. The results suggest that no copy is created even though the array b is passed to the function as an argument

Solution

Copy-on-write does one virtual memory page at a time. As long as your changes are within a single 4096-byte page, you'll only pay for that one page. When you modify a column, your changes are spread across many, many pages. We Python programmers aren't used to worrying about the layout in physical memory, but that's the issue here.

Question 1: If you pass the dataset as a parameter, then Python has to make a copy to transfer it over. The parameter passing mechanism doesn't use copy-on-write, partly because the reference counting stuff would be confused. When you create it as a global before things start, there's a solid reference, so the multiprocessing code can make copy-on-write happen.