Search code examples
pythonshslurm

"sh: logger: command not found" if slurm script submitted via os.system in Python


I am submitting jobs to a cluster that is managed with SLURM. I have a python script that automates my job submission since I am doing hyperparameter tuning.

In my python script, I run

os.system('sbatch ' + fname)

where fname is a text file containing all the settings for the job to be submitted. I used this set-up for a previous cluster I was working on and it worked fine.

Now I'm trying the same set-up on a different cluster and my script doesn't work. I get the complaint: sh: sbatch: command not found. I fixed this by using

os.system('/usr/local/slurm/bin/sbatch ' + fname)

instead. The script now works and is now able to submit to jobs.

However, when I look at the output file, the first line says

sh: logger: command not found

The job executed fine though. It was a simple print("Hello world") just for testing.

I find this strange since this now occurs after submitting the job to Slurm. If I just do sbatch fname on the terminal, I don't get this complaint printed on the log file.

I'm not sure how to resolve this. I'm concerned that I'll run into problems with other commands if the code gets more complicated.

If it makes a difference, the python script has the virtual shebang #!/usr/bin/python -u at the start of the file.


Solution

  • Run echo $PATH on the command-line. Its a list of pathnames separated by colon (:). In Python, before calling os.system(...), set os.environ['PATH'] to something which contains the programs sbatch, logger etc. Don't remove old entries either.