Search code examples
broadcastdaskdask-distributed

Dask scatter broadcast a list


what is the appropriate way to scatter broadcast a list using Dask disitributed?

case 1 - wrapping the list:

[future_list] = client.scatter([my_list], broadcast=True)

case 2 - not wrapping the list:

future_list = client.scatter(my_list, broadcast=True)

In the Dask documentation I have seen both examples: 1. wrapping (see bottom example) and 2. not wrapping. In my experience case 1 is the best approach, in case 2 constructing the Dask graph (large in my use case) takes a lot longer.

What could explain the difference in graph construction time? Is this expected behaviour?

Thanks in advance.

Thomas


Solution

  • If you call scatter with a list then Dask will assume that each element of that list should be scattered independently.

    a, b, c = client.scatter([1, 2, 3], ...)
    

    If you don't want this, if you actually just want your list to be moved around as a single piece of data, then you should wrap it in another list

    [future] = client.scatter([[1, 2, 3]], ...)