Search code examples
pythondaskdask-distributeddask-kubernetes

How do I get adaptive dask workers to run some code on startup?


I'm creating a dask scheduler using dask-kubernetes and putting it into adaptive mode.

from dask-kubernetes import KubeCluster
cluster = KubeCluster()
cluster.adapt(minimum=0, maximum=40)

I need each worker to run some setup code when they are created (setting some environment variables with os.environ) in order for the tasks to execute correctly.

I see in the docs there is a --preload flag for workers that you start from the command line. I'm guessing I need to set that directly into the adaptive scheduler somewhere.

How to I pass code to my workers to be executed when they start?


Solution

  • If all you're looking for is to set environment variables then you can probably handle this with the dask-kuberenetes configuration file. I think that KubeCluster may even have an env= keyword or something.

    For more general code you're correct that using preload scripts are currently the best approach. This isn't ideal in all situations though. Ideally you would be able to register some startup code with the scheduler to hand off to all workers as they start up. This isn't implemented as of 2018-08-01 though.