I am new to slurm(and HPC too) and I have written a script that I want to execute. I am also very careful because the cluster is of a different institute and I do not want to break/destroy things because of my stupidity. My script is taking a lot of time and I want to do it faster. I read on wiki that these are sometimes called embarrassingly parallel jobs? (meaning it is very easy to parallelize them).
How should I modify my script to make it run faster using some more cpu's? (It takes 16 mins to run any specific value of i,j,k). Can I do something so that it takes some values i,j,k and computes it a different CPU parallelly? Any help is very appreciated.
#!/bin/sh -e
#SBATCH -p hh
#SBATCH -o job.log
#SBATCH -e job.log
#SBATCH --exclusive
#SBATCH --job-name=myjob
#SBATCH --ntasks=1
#SBATCH -c 128
#SBATCH --hint nomultithread
#SBATCH --time=1-0
#SBATCH --exclude=hh003
for i in $(seq 1.0 0.05 3.65); do
for j in $(seq 3 7); do
for k in $(seq 0.01 0.01 0.08); do
do something
done
done
done
Since you seem not to have too many tasks to run, a super quick and easy solution would be to simply make a bash script that generates job requests. Make a bash file, say filename.sh
with the contents:
#!/bin/sh -e
for i in $(seq 1.0 0.05 3.65); do
for j in $(seq 3 7); do
for k in $(seq 0.01 0.01 0.08); do
sbatch batch_request_filename.sh i j k
done
done
done
And a second file (in this case batch_request_filename.sh
) that has the code that you would like to parallelize and all of the #SBATCH
entries that you need. In it, use $1
, $2
, and $3
for i
, j
and k
, respectively.
To run it, you would have to make the master file executable with chmod u+x filename.sh
and then use ./filename.sh
when you want to create the jobs.
This is by no means a perfect solution, but it is very quick to implement. Don't use this if you have too many tasks to run though as you might overwhelm the job scheduler.