Search code examples
python-3.xdistributeddask

dask processes tasks twice


I noticed that a tasks of a dask graph can be executed several times by different workers.

Also I see that log in the scheduler console (Don't know if it can be related to resilience):

"WARNING - Lost connection to ... while sending result: Stream is closed"

Is there a way to impede dask to execute the same task twice on different workers ?

Note that i'm using: dask 0.15.0 distributed 1.15.1

Thx

Bertrand


Solution

  • The short answer is "no".

    Dask reserves the right to call your function many times. This might occur if a worker goes down or if Dask does some load balancing and moves some tasks around the cluster while at the same time they've just started.

    However you can significantly reduce the likelihood of a task running multiple times by turning off work stealing:

    def turn_off_stealing(dask_scheduler):
        dask_scheduler.extensions['stealing']._pc.stop()
    
    client.run(turn_off_stealing)