Search code examples
daskamazon-emrrapids

DASK CUDA on multi node EMR cluster is unable to detect nodes


I have setup an AWS EMR cluster using 10 core nodes of type g4dn.xlarge (each machine/node conatins 1 GPU). When I run the following commands on Zeppelin Notebook, I see only 1 worker allotted in my LocalCUDACluster:

from dask_cuda import LocalCUDACluster
from dask.distributed import Client
cluster = LocalCUDACluster()
client = Client(cluster)

I tried passing n_workers=10 explicitly but it resulted in an error.

How do I make sure my LocalCUDACluser utilizes all of my other 9 nodes? What is the right way to setup a multi-node DASK-CUDA cluster? Any help regarding this is appreciated.


Solution

  • There are a few options to setup a multi-worker cluster (with or without GPU), described here.

    The docs don't seem to mention third-party solutions, but right now there are two companies offering these services: Coiled and Saturn Cloud.