Simple question. If I create a Dask cluster using the following code:
from dask.distributed import Client
client = Client()
How many workers will it create? I ran this code on one machine, and it created 4 workers. I ran this same code on a server, and it created 8 workers. Does it just create as much as it possibly can based on resources available? In the source code, there is no default value for n_workers
listed in the docstrings. I'm trying see how to create a cluster automatically without having to know in advance the resources available to me.
class LocalCluster(SpecCluster):
"""Create local Scheduler and Workers
This creates a "cluster" of a scheduler and workers running on the local
machine.
Parameters
----------
n_workers: int
Number of workers to start
memory_limit: str, float, int, or None, default "auto"
Sets the memory limit *per worker*.
Notes regarding argument data type:
* If None or 0, no limit is applied.
* If "auto", the total system memory is split evenly between the workers.
* If a float, that fraction of the system memory is used *per worker*.
* If a string giving a number of bytes (like ``"1GiB"``), that amount is used *per worker*.
* If an int, that number of bytes is used *per worker*.
dask.distributed
uses os.cpu_count()
to get the number of CPUs available on the machine, which is then used with the number of threads per worker to calculate the number of workers.