Search code examples
pythonjobs

Submitting jobs using python


I am trying to submit a job in a cluster in our institute using python scripts.

 compile_cmd = 'ifort -openmp ran_numbers.f90 ' + fname \
                  + ' ompscmf.f90 -o scmf.o'
 subprocess.Popen(compile_cmd, shell=True)

 Popen('qsub launcher',shell=True)

The problem is that , system is hanging at this point. Any obvious mistakes in the above script? All the files mentioned in the code are available in that directory ( I have cross checked that). qsub is a command used to submit jobs to our cluster. fname is the name of a file that I created in the process.


Solution

  • I have a script that I used to submit multiple jobs to our cluster using qsub. qsub typically takes job submissions in the form

    qsub [qsub options] job
    

    In my line of work, job is typically a bash (.sh) or python script (.py) that actually calls the programs or code to be run on each node. If I wanted to submit a job called "test_job.sh" with maximum walltime, I would do

    qsub -l walltime=72:00:00 test_job.sh
    

    This amounts to the following python code

    from subprocess import call
    
    qsub_call = "qsub -l walltime=72:00:00 %s"
    call(qsub_call % "test_job.sh", shell=True)
    

    Alternatively, what if you had a bash script that looked like

    #!/bin/bash
    
    filename="your_filename_here"
    ifort -openmp ran_numbers.f90 $filename ompscmf.f90 -o scmf.o
    

    then submitted this via qsub job.sh?


    Edit: Honestly, the most optimal job queueing scheme varies from cluster to cluster. One simple way to simplify you job submissions scripts is to find out how many CPUs are available at each node. Some of the more recent queueing systems allow you to submit many single CPU jobs and they will submit these together on as few nodes as possible; however, some older clusters won't do that and submitting many individual jobs is frowned upon.

    Say that each node in your cluster has 8 CPUs. You could write you script like

    #!/bin/bash
    #PBS -l nodes=1;ppn=8
    
    for ((i=0; i<8; i++))
    do
        ./myjob.sh filename_${i} &
    done
    wait
    

    What this will do is submit 8 jobs on one node at once (& means do in background) and wait until all 8 jobs are finished. This may be optimal for clusters with many CPUs per node (for example, one cluster that I used has 48 CPUs per node).

    Alternatively, if submitting many single core jobs is optimal and your submission code above isn't working, you could use python to generate bash scripts to pass to qsub.

    #!/usr/bin/env python
    import os
    from subprocess import call
    
    bash_lines = ['#!/bin/bash\n', '#PBS -l nodes=1;ppn=1\n']
    bash_name = 'myjob_%i.sh'
    job_call = 'ifort -openmp ran_numbers.f90 %s ompscmf.f90 -o scmf.o &\n'
    qsub_call = 'qsub myjob_%i.sh'
    
    filenames = [os.path.join(root, f) for root, _, files in os.walk(directory)
                                       for f in files if f.endswith('.txt')]
    for i, filename in enumerate(filenames):
        with open(bash_name%i, 'w') as bash_file:
            bash_file.writelines(bash_lines + [job_call%filename, 'wait\n'])
        call(qsub_call%i, shell=True)