I submit an array of many jobs to an LSF cluster. Most run and finish in the DONE state but some may EXIT. I need a way to have only any EXITing member jobs of the array to be re-run.
Thanks.
I've been playing around with the same issues and the command:
brequeue -e <jobarrayid>
should do what you're after. You don't need need to specify which elements should be rerun, the -e switch should pick out the EXIT'd indexes only.