I love iPython notebook but on parallel processing with ipyparallel, pushing every local objects into engine with dview.push(dict(...))
seems not effective way. Do you have any effective alternatives to do that?
I usually do like:
from IPython import parallel
rc = parallel.Client()
dview = rc[:]
dview.push(dict(x1=x1, x2=x2, ...., x100=x100)) # => I'd like to avoid this!!
res = dview.map_async(run_once, range(5))
map_result = res.get()
rc.close()
IPython Parallel doesn't resolve closures before sending functions by default. That means that when you send run_once
to the engine, and the body of run_once
looks for x1
, it's going to look for x1
on the engine, instead of carrying with it a copy of x1
on your client. This can be useful because it lets you do SPMD operations by changing what x1
means on each engine. For instance, this snippet relies on rank
having a different value on each engine in order to work properly:
dview.scatter('rank', rc.ids, flatten=True)
def mul_by_rank(x):
return x * rank
dview.map_sync(mul_by_rank, range(len(rc)))
If you do want to resolve closures when you send a function (that is, implicitly send x1, x2 along with run_once), you can use other serialization libraries which do this. One version of this is cloudpickle
, which you can enable with dview.use_cloudpickle()
(you must have installed cloudpickle first). If you do this, then the local variables that run_once
relies on should be sent along with run_once
:
dview.use_cloudpickle()
dview.map_sync(run_once, range(5))