Search code examples
pythondaskcpu-usagedask-distributed

Default n_workers when creating a Dask cluster?


Simple question. If I create a Dask cluster using the following code:

from dask.distributed import Client

client = Client()

How many workers will it create? I ran this code on one machine, and it created 4 workers. I ran this same code on a server, and it created 8 workers. Does it just create as much as it possibly can based on resources available? In the source code, there is no default value for n_workers listed in the docstrings. I'm trying see how to create a cluster automatically without having to know in advance the resources available to me.

class LocalCluster(SpecCluster):
"""Create local Scheduler and Workers

This creates a "cluster" of a scheduler and workers running on the local
machine.

Parameters
----------
n_workers: int
    Number of workers to start
memory_limit: str, float, int, or None, default "auto"
    Sets the memory limit *per worker*.

    Notes regarding argument data type:

    * If None or 0, no limit is applied.
    * If "auto", the total system memory is split evenly between the workers.
    * If a float, that fraction of the system memory is used *per worker*.
    * If a string giving a number of bytes (like ``"1GiB"``), that amount is used *per worker*.
    * If an int, that number of bytes is used *per worker*.

Solution

  • dask.distributed uses os.cpu_count() to get the number of CPUs available on the machine, which is then used with the number of threads per worker to calculate the number of workers.