Search code examples
pythontensorflowmultiprocessing

How to prevent multiprocessing from inheriting imports and globals?


I'm using multiprocessing in a larger code base where some of the import statements have side effects. How can I run a function in a background process without having it inherit global imports?

# helper.py:

print('This message should only print once!')
# main.py:

import multiprocessing as mp
import helper  # This prints the message.

def worker():
  pass  # Unfortunately this also prints the message again.

if __name__ == '__main__':
  mp.set_start_method('spawn')
  process = mp.Process(target=worker)
  process.start()
  process.join()

Background: Importing TensorFlow initializes CUDA which reserves some amount of GPU memory. As a result, spawing too many processes leads to a CUDA OOM error, even though the processes don't use TensorFlow.

Similar question without an answer:


Solution

  • Is there a resources that explains exactly what the multiprocessing module does when starting an mp.Process?

    Super quick version (using the spawn context not fork)

    Some stuff (a pair of pipes for communication, cleanup callbacks, etc) is prepared then a new process is created with fork()exec(). On windows it's CreateProcessW(). The new python interpreter is called with a startup script spawn_main() and passed the communication pipe file descriptors via a crafted command string and the -c switch. The startup script cleans up the environment a little bit, then unpickles the Process object from its communication pipe. Finally it calls the run method of the process object.

    So what about importing of modules?

    Pickle semantics handle some of it, but __main__ and sys.modules need some tlc, which is handled here (during the "cleans up the environment" bit).