Search code examples
daskdask-distributed

Dask : how the memory limit is calculated in "auto" mode?


The documentation shows the following formula in case of "auto" mode :

$ dask-worker .. --memory-limit=auto # TOTAL_MEMORY * min(1, nthreads / total_nthreads)

My CPU spec :

Architecture:                    x86_64
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1

My memory spec :

MemTotal:       16282416 kB
MemFree:         1142108 kB
MemAvailable:    9397036 kB

When I trigger the dask_worker command, the following output is displayed :

distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                          1
distributed.worker - INFO -                Memory:                   3.88 GiB
distributed.worker - INFO - -------------------------------------------------

Could you please explain, how 3.88 GiB memory is found ? It seems to mismatch with the previous formula


Solution

  • I suspect nthreads refers to how many threads this particular worker has available to schedule tasks on while total_nthreads refers to the total number of threads available on your system.

    The dask-worker CLI command has the same defaults as LocalCluster (see GitHub issue). Assuming the defaults for LocalCluster spin up n workers where n is the number of available cores on your system and assign m threads to each worker where m is the number of threads per core:

    n = 4 # number of cores 
    m = 1 # number of threads per core 
    
    TOTAL_MEMORY = 16282416 kB
    
    TOTAL_MEMORY * min(1, 1 / 4)
    
    > 4070604
    
    

    4070604 kB is 3.79 GiB

    See the docs here:

    https://docs.dask.org/en/latest/deploying-cli.html#dask-worker

    --nthreads

    Number of threads per process.

    --nprocs

    Deprecated. Use ‘–nworkers’ instead. Number of worker processes to launch. If negative, then (CPU_COUNT + 1 + nprocs) is used. Set to ‘auto’ to set nprocs and nthreads dynamically based on CPU_COUNT

    --nworkers <n_workers>

    Number of worker processes to launch. If negative, then (CPU_COUNT + 1 + nworkers) is used. Set to ‘auto’ to set nworkers and nthreads dynamically based on CPU_COUNT

    Also see the source for LocalCluster for how the defaults are set: