Search code examples
linuxparallel-processingclearcaselsf

launching parallel bsub job in clearcase environment


ClearCase does not work in conjunction with LSF distributed multi-host parallel job if more than 1 hosts are specified.

Reason: ClearCase does not mount the file system on all hosts when dispatching multi-host simulations to the LSF system

the job is terminated because included files are not found or cannot be output because the file system does not exist on all hosts.

The ClearCase + LSF implementation has to guarantee by construction that the job is dispatched correctly in 100% of all cases, which is currently not the case.

please help me on this issue.


Solution

  • The LSF/Clearcase integration uses the daemon.wrap program to set the view on the execution host and then launch the job inside the view. That wrapper doesn't support cross-host parallel jobs.

    You'll have to try to work around the limitation in your job script. You can disable the daemon wrapper by making sure the $CLEARCASE_ROOT is not set in your job submission environment. Then in the job script, in the execution environment, and in each process that is participating in the parallel job the job script can call cleartool setview <options> <real job command>.

    If you launch your job with blaunch then it might make things easier. Without blaunch, LSF will start a single process on the first execution host. With blaunch, LSF will launch one process per slot, and launch it on all of the allocated execution hosts. With blaunch, each process can then set the view and start the real job.

    Good luck!