Search code examples
parallel-processingdaskdask-distributeddask-delayed

how to create writable shared array in dask


I'm new to Dask what i'm trying to find is "shared array between processes and it needed to be writable by any proccess" could someone can show me that? Top

a way to implement shared writable array in dask


Solution

  • Dask's internal abstraction is a DAG, a functional graph in which it is assumed that tasks act the same should you rerun them ("functionally pure"), since it's always possible that a task runs in two places, or that a worker which holds a task's output dies.

    Dask does not, therefore, support mutable data structures as task inputs/outputs normally. However, you can execute tasks that create mutation as a side-effect, such as any of the functions that write to disk.

    If you are prepared to set up your own shared memory and pass around handles to this, there is nothing stopping you from making functions that mutate that memory. The caveats around tasks running multiple times hold, and you would be on your own. There is no mechanism currently to do this kind of thing for you, but it is something I personally intend to investigate within the next few months.