I am stuck in a strange place. I have a bunch of delayed function calls that I want to execute in a certain order. While executing in parallel is trivial:
res = client.compute([myfuncs])
res = client.gather(res)
I can't seem to find a way to execute them in sequence, in a non-blocking way.
Here's a minimal example:
import numpy as np
from time import sleep
from datetime import datetime
from dask import delayed
from dask.distributed import LocalCluster, Client
@delayed
def dosomething(name):
res = {"name": name, "beg": datetime.now()}
sleep(np.random.randint(10))
res.update(rand=np.random.rand())
res.update(end=datetime.now())
return res
seq1 = [dosomething(name) for name in ["foo", "bar", "baz"]]
par1 = dosomething("whaat")
par2 = dosomething("ahem")
pipeline = [seq1, par1, par2]
Given the above example, I would like to run seq1
, par1
, and par2
in parallel, but the constituents of seq1
: "foo", "bar", and "baz", in sequence.
You could definitely cheat and add an optional dependency to your function as follows:
@dask.delayed
def dosomething(name, *args):
...
So that you can make tasks depend on one-another, even thought you don't use one result in the next run of the function:
inputs = ["foo", "bar", "baz"]
seq1 = [dosomething(inputs[0])]
for bit in inputs[1:]:
seq1.append(dosomething(bit, seq1[-1]))
Alternatively, you can read about the distributed scheduler's "futures" interface, whereby you can monitor the progress of tasks in real time.