I have created a 1st cluster using this in my jupyter notebook:
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(name='clus1',n_workers=1,dashboard_address='localhost:8789')
client = Client(cluster)
Then read my data using pandas. and performed some preprocessing.
After that, I created 2nd cluster in 2nd jupyter notebook.
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(name='clus2',n_workers=1,dashboard_address='localhost:8790')
client = Client(cluster)
Now I want to fetch the data from one cluster to another cluster.
is there any way around it?
As noted in the comment by @mdurant, another option (if appropriate for the problem at hand) is to re-use the same cluster:
from dask.distributed import Client, LocalCluster
cluster = LocalCluster(name='clus1',n_workers=1,dashboard_address='localhost:8789')
client = Client(cluster)
client.write_scheduler_file('tmp_scheduler.dask')
Then in the relevant sections, you could connect to the cluster (from multiple notebooks):
from dask.distributed import Client
client = Client(scheduler_file='tmp_scheduler.dask')
This obviates the need to transfer files between clusters (as data is on the same cluster).