Search code examples
pythoncopymultiprocessingdeep-copy

Create a copy of an object rather that reinitialising inside of a new multiprocessing process


This code shows the structure of what I am trying to do.

import multiprocessing
from foo import really_expensive_to_compute_object

## Create a really complicated object that is *hard* to initialise.
T = really_expensive_to_compute_object(10) 

def f(x):
  return T.cheap_calculation(x)

P = multiprocessing.Pool(processes=64)
results = P.map(f, range(1000000))

print results

The problem is that each process starts by spending a lot of time recalculating T instead of using the original T that was computed once. Is there a way to prevent this? T has a fast (deep) copy method, so can I get Python to use that instead of recalculating?


Solution

  • Why not have f take a T parameter instead of referencing the global, and do the copies yourself?

    import multiprocessing, copy
    from foo import really_expensive_to_compute_object
    
    ## Create a really complicated object that is *hard* to initialise.
    T = really_expensive_to_compute_object(10) 
    
    def f(t, x):
      return t.cheap_calculation(x)
    
    P = multiprocessing.Pool(processes=64)
    results = P.map(f, (copy.deepcopy(T) for _ in range(1000000)), range(1000000))
    
    print results