I am trying to run a DAG of tasks using dask API for my specific application. To put it in a contrived example, I want tasks to pass out their success/failure flags and use those as the input to other tasks.
However, dask does not let me do __bool__
calls (a and b
) on delayed objects. But how is it different from bitwise boolean ops (i.e. a & b
).
Why is it implemented as not supported? and how hard is it to fix it locally?
I tried digging into the source code but I couldn't understand how a & b
successfully returns a sub-graph of ('and_', 'a', 'b'), but a and b
does not return something like ('__bool__1', 'a'), ('__bool__2', 'b'), ('and_', '__bool__1', '__bool__2').
I have provided the simplest source code to be able to re-produce the problem.
import dask
from time import sleep
@dask.delayed
def task(x, cond):
if not cond:
return False
sleep(x)
return True
def run_graph():
task1_done = task(2, True)
task2_done = task(1, True)
task3_done = task(1, task2_done)
all_done = task1_done and task3_done
return all_done
if __name__ == '__main__':
done = run_graph()
dask.compute(done)
if we replace the and operation with &, it works fine.
all_done = task1_done & task3_done
This might not be an issue here, but I want to use all()
and any()
built in functions for a list of delayed flags and those call __bool__
internally.
I don't know Dask personally in detail, but I suspect that it simply implements __and__
on it's objects. This does not convert the object to a boolean at all. This is unlike and, or etc, which convert the object to a boolean first.
This can be quickly tested with a small test class:
In [1]: class Test:
...: def __and__(self, other):
...: print("And called!")
...: return self
...: def __bool__(self):
...: print("Bool called!")
...: return True
...:
In [2]: a = Test()
In [3]: b = Test()
In [4]: a & b
And called!
Out[4]: <__main__.Test at 0x7f5eb58f4eb8>
In [15]: a and b
Bool called!
Out[5]: <__main__.Test at 0x7f5eb587e400>
Since Dask does delayed evaluation from my understanding, it is probable that __bool__
would have force immediate evaluation to work well, while __and__
can return a lazy object (since it returns an object of the same type, not a boolean).