Search code examples
daskdask-distributeddask-delayed

how to pass client side dependency to the dask-worker node


scriptA.py contents:

import shlex, subprocess
from dask.distributed import Client

def my_task(params):
  print("params[1]", params[1]) ## prints python scriptB.py arg1 arg2
  child = subprocess.Popen(shlex.split(params[1]), shell=False)
  child.communicate()

if __name__ == '__main__':

    clienta = Client("192.168.1.3:8786")
    params=["dummy_arguments", "python scriptB.py arg1 arg2"]
    future = clienta.submit(my_task, params)
    print(future.result())

print("over.!")

scriptB.py contents:

import file1, file2
from folder1 import file4
import time

for _ in range(3):
  file1.do_something();
  file4.try_something();
  print("sleeping for 1 sec")
  time.sleep(1)
print("waked up..")

scriptA.py runs on node-1(192.168.23.12:9784) while the dask-worker runs on another node-2 (198.168.54.86:4658) and dask-scheduler is on different node-3(198.168.1.3:8786).

The question here is how to pass the dependencies needed by scriptB.py such as folder1, file1, file2 etc. to the dask-worker node-2 from scriptA.py which is running on node-1.?


Solution

  • You might want to look at the Client.upload_file method.

    client.upload_file('/path/to/file1.py')
    

    For any larger dependency though you are generally expected to handle dependencies yourself. In larger deployments people typically rely on some other mechanism, like Docker or a network file system, to ensure uniform software dependencies.