I have several python scripts that run different Dask tasks from different databases and I used the Python multiprocessing module to run all of the Python scripts simultaneously. I checked with my task manager that the scripts are running parallelly and I was able to access my dask dashboard. However, my dask dashboard is not showing anything. Here is the screenshot of my dask dashboard.
This is my Python code snippet sample (simplified):
if __name__ == '__main__':
# Setup Dask Distributed Client
client = Client(n_workers=4, threads_per_worker=4)
"""
call the scripts here and store it in a process list
"""
for process in processes:
process.start()
When combining multiple processes with Client()
, you are creating a default cluster in each process. The one in the main process will be the dashboard that you can see, and the others will also have dashboards, but at different ports. If you capture the stdout of the subprocesses, they would tell you on which ports.
This is likely not what you meant to do. If you want multiple processes to talk to a single cluster, you should create that cluster first, and then connect to it with something like client("tcp://locaklhost:8786")
.
However, this all also raises the question, if you want to use dask, why are you also creating processes? Why not just let Dask take care of executing things, e.g., with client.submit()
.