Search code examples
pythonbooleandaskdask-delayed

Why does __bool__ built-in function have to raise exception on dask.delayed objects?


I am trying to run a DAG of tasks using dask API for my specific application. To put it in a contrived example, I want tasks to pass out their success/failure flags and use those as the input to other tasks.

However, dask does not let me do __bool__ calls (a and b) on delayed objects. But how is it different from bitwise boolean ops (i.e. a & b).

Why is it implemented as not supported? and how hard is it to fix it locally?

I tried digging into the source code but I couldn't understand how a & b successfully returns a sub-graph of ('and_', 'a', 'b'), but a and b does not return something like ('__bool__1', 'a'), ('__bool__2', 'b'), ('and_', '__bool__1', '__bool__2').

I have provided the simplest source code to be able to re-produce the problem.

import dask
from time import sleep

@dask.delayed
def task(x, cond):
    if not cond:
        return False
    sleep(x)
    return True

def run_graph():
    task1_done = task(2, True)
    task2_done = task(1, True)
    task3_done = task(1, task2_done)

    all_done = task1_done and task3_done
    return all_done

if __name__ == '__main__':
    done = run_graph()
    dask.compute(done)

if we replace the and operation with &, it works fine.

all_done = task1_done & task3_done

This might not be an issue here, but I want to use all() and any() built in functions for a list of delayed flags and those call __bool__ internally.


Solution

  • I don't know Dask personally in detail, but I suspect that it simply implements __and__ on it's objects. This does not convert the object to a boolean at all. This is unlike and, or etc, which convert the object to a boolean first.

    This can be quickly tested with a small test class:

    In [1]: class Test: 
        ...:     def __and__(self, other): 
        ...:         print("And called!") 
        ...:         return self 
        ...:     def __bool__(self): 
        ...:         print("Bool called!") 
        ...:         return True 
        ...:                                                                                                                                                                                                                             
    
    In [2]: a = Test()                                                                                                                                                                                                                  
    
    In [3]: b = Test()                                                                                                                                                                                                                  
    
    In [4]: a & b                                                                                                                                                                                                                       
    And called!
    Out[4]: <__main__.Test at 0x7f5eb58f4eb8>
    
    In [15]: a and  b                                                                                                                                                                                                                    
    Bool called!
    Out[5]: <__main__.Test at 0x7f5eb587e400>
    

    Since Dask does delayed evaluation from my understanding, it is probable that __bool__ would have force immediate evaluation to work well, while __and__ can return a lazy object (since it returns an object of the same type, not a boolean).