Search code examples
pythonparallel-processingmultiprocessingglobal-variables

How global variable works in parallel programming with Python?


I have this code. In the sequential approach the message "no ok" is printed, while in the parallel approach the message ["ok", "ok", "ok"] is printed instead of the ["not ok", "not ok", "not ok"] that I expected.

How could I change variable globVar without giving it as argument in "test" function?

import multiprocessing

global globVar
globVar = 'ok'

def test(arg1):
    print(arg1)
    return globVar
    
if __name__ == "__main__" :
    globVar = 'not ok'

    #Sequential
    print(test(0))    
    
    #Parallel 
    pool = multiprocessing.Pool()
    argList = [0,1,2]
    result = pool.map(test,argList)
    pool.close()

Solution

  • TL;DR. You can skip to the last paragraph for the solution or read everything to understand what is actually going on.

    You did not tag your question with your platform (e.g. windows or linux) as the guidelines for posting questions tagged with multiprocessing requests that you do; the behavior ("behaviour" for Anglos) of global variables very much depends on the platform.

    On platforms that use method spawn to create new processes, such as Windows, to create and initialize each processes in the pool that is created with your pool = multiprocessing.Pool() statement, a new, empty address space is created and a new Python interpreter is launched that re-reads and re-executes the source program in order to initialize the address space before ultimately calling the worker function test. That means that every statement at global scope, i.e. import statements, variable declarations, function declarations, etc., are executed for this purpose. However, in the new subprocess variable __name__ will not be "__main__" so any statements within the if __name__ == "__main__" : block will not be executed. That is why for Windows platforms you must put code that creates new processes within such a block. Failure to do so would result in an infinite recursive process-creation loop if it were to go otherwise undetected.

    So if you are running under Windows, your main process has set globVar to 'not ok' just prior to creating the pool. But when the processes are initialized prior to calling test, your source is re-executed and each process, which runs in its own address space and therefore has its own copy of globVar re-initialized that variable back to 'ok'. That is the value that test will see and the previous statement implies that modifying that local copy of globVar will not be reflected back to the main process.

    Now on platforms that use fork to create new processes, such as Linux, things are a bit different. When the subprocesses are created, each one inherits the address space of the parent process as read-only and only when it attempts to modify memory does it get a copy ("copy on write"). This is clearly a more efficient process-creating mechanism. So in this case test will see globVar having a value of 'not ok' because that was the value it had at the time the subprocesses were created. But if test updates globVar, the "copy on write" mechanism will ensure that it is updating a globVar that exists in a local address space. So again the main process will not see the updated value.

    So having worker functions returning values as your test function is doing is a standard way of reflecting back to the main process results. Your problem is that you are not starting with a globVar value that you expected. This can be solved by initializing the pool's processes with the correct globVar value using the initializer and initargs arguments to the Pool constructor (see the documentation):

    import multiprocessing
    
    global globVar
    globVar = 'ok'
    
    def init_processes(gVar):
        global globVar
        globVar = gVar
    
    def test(arg1):
        print(arg1)
        return globVar
    
    if __name__ == "__main__" :
        globVar = 'not ok'
    
        #Sequential
        print(test(0))
    
        #Parallel
        pool = multiprocessing.Pool(initializer=init_processes, initargs=(globVar,))
        argList = [0,1,2]
        result = pool.map(test,argList)
        pool.close()
        print(result)
    

    Prints:

    0
    not ok
    0
    1
    2
    ['not ok', 'not ok', 'not ok']