what is the appropriate way to scatter broadcast a list using Dask disitributed?
case 1 - wrapping the list:
[future_list] = client.scatter([my_list], broadcast=True)
case 2 - not wrapping the list:
future_list = client.scatter(my_list, broadcast=True)
In the Dask documentation I have seen both examples: 1. wrapping (see bottom example) and 2. not wrapping. In my experience case 1 is the best approach, in case 2 constructing the Dask graph (large in my use case) takes a lot longer.
What could explain the difference in graph construction time? Is this expected behaviour?
Thanks in advance.
Thomas
If you call scatter with a list
then Dask will assume that each element of that list should be scattered independently.
a, b, c = client.scatter([1, 2, 3], ...)
If you don't want this, if you actually just want your list to be moved around as a single piece of data, then you should wrap it in another list
[future] = client.scatter([[1, 2, 3]], ...)