I'm trying to use Dask job-queue on our HPC system. And this is the code I'm using:
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(cores=2, memory='20GB', processes=1,
log_directory='logs',
death_timeout=6000, walltime='8:00:00',
shebang='#!/usr/bin/ bash')
cluster.scale(5)
from dask.distributed import Client
client = Client(cluster)
After executing the code, I can use squeue
to check for submitted jobs, and I can see 5 of them in running R
state. But the job are killed after several second. In the .err
files, I found this message:
slurmstepd-midway2-0354: error: execve(): /tmp/slurmd/job10469239/slurm_script: Permission denied
I'm very new to Dask and am not sure what's wrong. Any idea would be appreciated! Thanks!
The main problem is with an incorrect specification of shebang:
# ...
shebang='#!/usr/bin/env bash')
# ...
Depending on your SLURM settings, you might also need to specify queue
(the appropriate SLURM cluster partition).
In case there are problems in the future, you can inspect the script that dask_jobqueue
submits using:
print(cluster.job_script())