Search code examples
pythonparallel-processingipythonipython-parallel

iPython parallel pushing every objects is messy


I love iPython notebook but on parallel processing with ipyparallel, pushing every local objects into engine with dview.push(dict(...)) seems not effective way. Do you have any effective alternatives to do that?

I usually do like:

from IPython import parallel

rc = parallel.Client()
dview = rc[:]
dview.push(dict(x1=x1, x2=x2, ...., x100=x100)) # => I'd like to avoid this!!
res = dview.map_async(run_once, range(5))
map_result = res.get()
rc.close()

Solution

  • IPython Parallel doesn't resolve closures before sending functions by default. That means that when you send run_once to the engine, and the body of run_once looks for x1, it's going to look for x1 on the engine, instead of carrying with it a copy of x1 on your client. This can be useful because it lets you do SPMD operations by changing what x1 means on each engine. For instance, this snippet relies on rank having a different value on each engine in order to work properly:

    dview.scatter('rank', rc.ids, flatten=True)
    
    def mul_by_rank(x):
        return x * rank
    
    dview.map_sync(mul_by_rank, range(len(rc)))
    

    If you do want to resolve closures when you send a function (that is, implicitly send x1, x2 along with run_once), you can use other serialization libraries which do this. One version of this is cloudpickle, which you can enable with dview.use_cloudpickle() (you must have installed cloudpickle first). If you do this, then the local variables that run_once relies on should be sent along with run_once:

    dview.use_cloudpickle()
    dview.map_sync(run_once, range(5))