Search code examples
pythondaskdask-delayed

How to find inputs of dask.delayed task?


Given a dask.delayed task, I want to get a list of all the inputs (parents) for that task.

For example,

from dask import delayed

@delayed
def inc(x):
    return x + 1

def inc_list(x):
    return [inc(n) for n in x]

task = delayed(sum)(inc_list([1,2,3]))
task.parents ???

Yields the following graph. How could I get the parents of sum#3 such that it yields a list of [inc#1, inc#2, inc#3]?

enter image description here


Solution

  • Delayed objects don't store references to their inputs, however you can get these back if you're willing dig into the task graph a bit and reconstruct Delayed objects manually.

    In particular you can index into the .dask attribute with the delayed objects' key

    >>> task.dask[task.key]
    (<function sum>,
     ['inc-9d0913ab-d76a-4eb7-a804-51278882b310',
      'inc-2f0e385e-beef-45e5-b47a-9cf5d02e2c1f',
      'inc-b72ce20f-d0c4-4c50-9a88-74e3ef926dd0'])
    

    This shows the task definition (see Dask's graph specification)

    The 'inc-...' values are other keys in the task graph. You can get the dependencies using the dask.core.get_dependencies function

    >>> from dask.core import get_dependencies
    >>> get_dependencies(task.dask, task.key)
    {'inc-2f0e385e-beef-45e5-b47a-9cf5d02e2c1f',
     'inc-9d0913ab-d76a-4eb7-a804-51278882b310',
     'inc-b72ce20f-d0c4-4c50-9a88-74e3ef926dd0'}
    

    And from here you can make new delayed objects if you wish

    >>> from dask.delayed import Delayed
    >>> parents = [Delayed(key, task.dask) for key in get_dependencies(task.dask, task.key)]
    [Delayed('inc-b72ce20f-d0c4-4c50-9a88-74e3ef926dd0'),
     Delayed('inc-2f0e385e-beef-45e5-b47a-9cf5d02e2c1f'),
     Delayed('inc-9d0913ab-d76a-4eb7-a804-51278882b310')]