Search code examples
pythondaskhadoop-yarnamazon-emrdask-distributed

How to Show Dask Dashboard Link When Submitting Dask-Yarn Job Remotely?


Problem

Would anyone happen to know how to retrieve the dask dashboard link when I submit my dask-yarn job? I have a print statement for displaying the dask dashboard link, but it doesn't show up in the console. I've also tried logging to stdout and trying to see if it appears in the yarn logs, but still no luck. Any help would be greatly appreciated!

Code Example:

submit.sh

dask-yarn submit \
  --name uq_component_batch_inference \
  --deploy-mode remote \
  --environment uq_component_dask.tar.gz \
  --worker-count 500 \
  --worker-vcores 1 \
  --worker-memory 8GiB \
  --worker-env TOKENIZERS_PARALLELISM=True \
  --worker-restarts 9 \
  main.py

main.py

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
cluster = YarnCluster.from_current()
client = Client(cluster)
logging.debug(f"dashboard link: {client.dashboard_link}")
run()

What I've tried:

yarn logs -applicationId application_1637895115092_0039 > temp-file.log

Solution

  • One option is to explicitly write the scheduler information into a file:

    from dask.distributed import Client
    
    client = Client()
    client.write_scheduler_file("scheduler.json")
    

    Note that this doesn't include an explicit link, but the relevant port is in services -> dashboard.

    Another option is to write the output of client.dashboard_link into a file (versus logging as in the snippet you provided).