Search code examples
distributed-computinglsf

Wait for all LSF jobs with given name, overriding JOB_DEP_LAST_SUB = 1


I've got a large computational task, consisting of several steps, that I run on a PC cluster, managed by LSF.

Part of this task includes launching several parallel jobs with identical names. Jobs are somewhat different, therefore it is hard to transform them to a job array.

The next step of this computation, following these jobs, summarizes their results, therefore it must wait until all of them are finished.

I'm trying to use -w ended(job-name) command line switch for bsub, as usual, to specify job dependencies.

However, admins of the cluster have set JOB_DEP_LAST_SUB = 1 in lsb.params.

According to the LSF manual, this makes LSF to wait for only one most recent job with supplied name to complete, instead of all jobs.

Is it possible to override this behavior for my task only without asking admins to reconfigure the whole cluster (this cluster is used by many people, it is very unlikely that they agree)?

I cannot find any clues in the manual.


Solution

  • Looks like it cannot be overridden.

    I've changed job names to make them unique by appending random value, then I've changed condition to -w ended(job-name-*)