Search code examples
pythoncondor

How do I get the condor job number in python and to the output script?


I want two things:

  1. the job number from within python
  2. put in in the output file.

my submission script looks something like this:

####################
#
# Simple HTCondor submit description file
#
####################

Executable = test_condor.py
Log          = condor_job_log.out
Output       = condor_job_stdout.out
Error        = condor_job_stdout.out
# Use this to make sure 1 gpu is available. The key words are case insensitive. 
REquest_gpus = 1
# Note: to use multiple CPUs instead of the default (one CPU), use request_cpus as well
Request_cpus = 4
# E-mail option
Notify_user = [email protected]

# "Queue" means add the setup until this line to the queue (needs to be at the end of script).
Queue

and I want the output files to have the job number appended something like:

Log          = condor_job_log{$JOB_ID}.out

I tried looking for the environment name by printing all environment variables in python but it was not helpful:

 os.environ = environ({'_CONDOR_ANCESTOR_2148': '3092:1586844319:3811816668', '_CONDOR_ANCESTOR_18122': '18123:1588528659:3276981140', '_CONDOR_ANCESTOR_3092': '18122:1588528659:978447114', 'TEMP': '/srv/condor/execute/dir_18122', '_CONDOR_SCRATCH_DIR': '/srv/condor/execute/dir_18122', '_CONDOR_SLOT': 'slot1_4', 'BATCH_SYSTEM': 'HTCondor', 'TMPDIR': '/srv/condor/execute/dir_18122', '_CONDOR_CHIRP_CONFIG': '/srv/condor/execute/dir_18122/.chirp.config', '_CONDOR_JOB_PIDS': '', 'TMP': '/srv/condor/execute/dir_18122', 'OMP_NUM_THREADS': '4', '_CONDOR_AssignedGPUs': 'CUDA1', '_CONDOR_JOB_AD': '/srv/condor/execute/dir_18122/.job.ad', 'CUDA_VISIBLE_DEVICES': '1', '_CONDOR_JOB_IWD': '/home/me/repo/repo-proj/code', '_CHIRP_DELAYED_UPDATE_PREFIX': 'Chirp', 'GPU_DEVICE_ORDINAL': '1', '_CONDOR_MACHINE_AD': '/srv/condor/execute/dir_18122/.machine.ad'})

since the jobnumber should have been something else like:

Submitting job(s).
1 job(s) submitted to cluster 11011.

and I tried searching for that number in and no luck. So I can't get it from python...so how do I get it?


This wasn't helpful: https://www-auth.cs.wisc.edu/lists/htcondor-users/2005-February/msg00202.shtml

because I don't know what `no env variable as standard but there is another way with the predefined macros

include it the environment with (for example) environment = CONDOR_ID=$(Cluster).$(Process)` means. Do I do that in my submission script? But my submission script is a python script...I'm confused. I tried seeing all environment variables names and nothing matched what I expected.


Solution

  • If you want the job id in the name of the output file, try something like

    output = my_job_$(CLUSTER).out
    

    Note that a condor job id has two parts, the "cluster" and the "proc". The proc is always 0 if you just end the submit file with a

    queue
    

    statement. If you submit multiple procs per cluster with

    queue 100
    

    then then procs will go from 0 to 99.

    in that case, you might want to put the cluster and proc into the file name like

    output = my_job_$(CLUSTER).$(PROCESS).out
    

    Getting the Cluster id into the environment isn't too hard, let's say you want it in the environment variable MY_JOB_ID. You can then add to the submit file (before the queue statement)

    environment = MY_JOB_ID = $(CLUSTER)
    

    then your python script will see the cluster id in the environment variable named MY_JOB_ID