Search code examples
lsf

Forcing LSF to execute jobs on different hosts


I have a setup consisting from 3 workers and a management node, which I use for submitting tasks. I would like to execute concurrently a setup script at all workers:

bsub -q queue -n 3 -m 'h0 h1 h2' -J "%J_%I" mpirun setup.sh

As far as I understand, I could use 'ptile' resource constraint to force execution at all workers:

bsub -q queue -n 3 -m 'h0 h1 h2' -J "%J_%I" -R 'span[ptile=1]' mpirun setup.sh

However, occasionally I face an issue that my script got executed several times at the same worker.

Is it expected behavior? Or there is a bug in my setup? Is there a better way for enforcing multi worker execution?


Solution

  • Your understanding of span[ptile=1] is correct. LSF will only use 1 core per host for your job. If there aren't enough hosts based on the -n then the job will pend until something frees up.

    However, occasionally I face an issue that my script got executed several times at the same worker.

    I suspect that its something with your script. e.g., LSF appends to the stdout file by default. Use -oo to overwrite.