Search code examples
cluster-computingjobshpccondor

Code updates during submission of condor jobs


When using condor for distributing jobs across a dedicated computer cluster, one first submits the jobs to the cluster and then waits for them to actually start running. Depending on multiple factors, they might stay in an idle state for quite some time, even hours.

Let us say I just compiled the code that is going to be run in the jobs. I can submit the jobs via a condor submission file. I then realize I would like to change the original code, either because there is some bug in it, or else because I want to try different parameters. In the case the code finishes compiling while the jobs are still in an idle state, which version is going to be run in the cluster? In other words, does condor somehow stores a snapshot of the code when the jobs are submitted, or it just picks it when the jobs start running?

Despite thinking the first option sounds way more reasonable, I have evidence from my own work that the second is the one that actually happens.


Solution

  • When condor_submit is run, the executable is copied to the spool directory under the scheduler. This is called spooling. If you want to be able to change the executable after submission, probably the best thing to do is to make your executable a shell script that calls the real executable, and put the executable into the transfer_input_files list.