Search code examples
bashcluster-computingqsubsungridengine

QSUB: Specify output and error files for each task in Job Array


Hopefully this is not a dublicate and also not just a problem of our cluster's configuration...

I am submitting a job array to a cluster using qsub with the following command:

qsub -q QUEUE -N JOBNAME -t 1:10 -e ${ERRFILE}_$SGE_TASK_ID /path/to/script.sh

where

ERRFILE=/home/USER/somedir/errors.

The idea is to specify an error file (also analogously the output file) that also contains the task ID from within the job array.

So far I have learned that the line

#$ -e ${ERRFILE}_$SGE_TASK_ID

inside the script.sh, does not work, because it is a comment and not evaluated by bash. My first line does not work however because $SGE_TASK_ID is only set AFTER the job is submitted.
I read here that escaping the evaluation of $SGE_TASK_ID (in that link it's PBS' $PBS_JOBID, but a similar problem) should work, but when I tried

qsub -q QUEUE -N JOBNAME -t 1:10 -e ${ERRFILE}_\$SGE_TASK_ID /path/to/script.sh

it did not work as expected.

Am I missing something obvious? Is it possible to use $SGE_TASK_ID in the name of an error file (the automatic naming of error files does that, but I want to specify the directory and if possible the name, too)?

Some additional remarks:

  • I am using the -cwd option for qsub inside script.sh, but that is NOT where I want my error files to be stored.
  • I have next to no control over how the cluster works and no root access (wouldn't know what I could need it for in this context but anyway...).
  • Apparently our cluster does not use PBS.
  • Yes my scripts are all executable and where applicable started with #!/bin/bash (I also specified the use of bash with the -S /bin/bash option for qsub).
  • There seems to be a solution here, but I am not quite sure how that works and it also appears to be using PBS. If that answer DOES apply to my question and I misunderstood it, please let me know.

I would appreciate any hint into the right direction. Thank You!


Solution

  • I didn't know this either, but it looks like Grid Engine has something called "pseudo environment variables" like $TASK_ID for this purpose. This should work:

    qsub -q QUEUE -N JOBNAME -t 1:10 -e ${ERRFILE}_\$TASK_ID /path/to/script.sh
    

    From the man page:

     -e [[hostname]:]path,...
          ...
    
          If the  pathname  contains  certain  pseudo
          environment  variables, their value will be expanded at
          runtime of the job and will be used to  constitute  the
          standard  error  stream path name. The following pseudo
          environment variables are supported currently:
    
          $HOME       home directory on execution machine
          $USER       user ID of job owner
          $JOB_ID     current job ID
          $JOB_NAME   current job name (see -N option)
          $HOSTNAME   name of the execution host
          $TASK_ID    array job task index number