Search code examples
pythondaskdask-distributedapscheduler

Submit worker functions in dask distributed without waiting for the functions to end


I have this python code that uses the apscheduler library to submit processes, it works fine:

from apscheduler.schedulers.background import BackgroundScheduler

scheduler = BackgroundScheduler()
array = [ 1, 3, 5, 7]

for elem in array:
    scheduler.add_job(function_to_submit, kwargs={ 'elem': elem })

scheduler.start()


def function_to_submit(elem):
     print(str(elem))

Note that the processes are submitted in parallel, and that the code does NOT wait for the processes to end.

What I need is to migrate this code to dask distributed to use workers. The problem that I have is that if I use dask submit method the code waits until all the functions end, and I need the code to continue. How to achieve that?

   client = Client('127.0.0.1:8786')
   future1 = client.submit(function_to_submit, 1)
   future3 = client.submit(function_to_submit, 3)
   L = [future1, future3]
   client.gather(L)  # <-- this waits until all the futures end

Solution

  • Dask distributed has a fire_and_forget method which is an alternative to e.g. client.compute or dask.distributed.wait if you want the scheduler to hang on to the tasks even if the futures have fallen out of scope on the python process which submitted them.

    from dask.distributed import Client, fire_and_forget
    client = Client('127.0.0.1:8786')
    fire_and_forget(client.submit(function_to_submit, 1))
    fire_and_forget(client.submit(function_to_submit, 3))
    
    # your script can now end and the scheduler will
    # continue executing these until they end or the
    # scheduler is terminated
    

    See the dask.distributed docs on Futures for examples and other usage patterns.