I have been working with dask and I have a question related the clients when you are processing a large script with high computational requirements
client = Client(n_workers = NUM_PARALLEL)
...
more code
...
client.shutdown()
I have seen some people that are shutting down the client in the middle of the process and then and then initializing it again, is it good for speed?
On the other hand, the workers are running out of memory, do you know if its a good practise to compute() several times the dask dataframe instead of computing it only once at the end, that may be beyond the performance capacity of the pc.
I have seen some people that are shutting down the client in the middle of the process and then and then initializing it again, is it good for speed?
IIUC, there is no effect on speed, there might be slight slow down in terms of time to spin up a scheduler/cluster. The only slight advantage is if you are sharing resources, then shutting down the cluster will free up the resources.
On the other hand, the workers are running out of memory, do you know if its a good practise to compute() several times the dask dataframe instead of computing it only once at the end, that may be beyond the performance capacity of the pc.
This really depends on the DAG, there might be an advantage in computing at an intermediate step if it reduces the number of partitions/tasks, especially if some of the results are computed multiple times.