Search code examples
pythondask-distributedcpu-cores

Dask Distributed - how to run one task per worker, making that task running on all cores available into the worker?


I'm very new at using distributed python library. I have 4 workers and i have successfully launched some parallel runs using 14 cores (among the 16 available) for each worker, resulting in 4*14=56 tasks running in parallel.

But how to proceed if I would like only one task at once in each worker. In that way, I expect one task using the 14 cores in parallel on the worker.


Solution

  • Dask workers maintain a single thread pool that they use to launch tasks. Each task always consumes one thread from this pool. You can not tell a task to take many threads from this pool.

    However, there are other ways to control and restrict concurrency within dask workers. In your case you might consider defining worker resources. This would let you stop many big tasks from running at the same time on the same workers.

    In the following example we define that each worker has one Foo resource and that each task requires one Foo to run. This will stop any two tasks from running concurrently on the same worker.

    dask-worker scheduler-address:8786 --resources Foo=1
    dask-worker scheduler-address:8786 --resources Foo=1
    

    .

    from dask.distributed import Client
    client = Client('scheduler-address:8786')
    futures = client.map(my_expensive_function, ..., resources={'Foo': 1})