Search code examples
bashcluster-computingpbs

Grep qstat output and copy files once done


I am using PBS job scheduler on my cluster. In bash,I would like to monitor the job status and once the job is done I would like to copy the results to a certain location(/data/myfolder/)

My qstat output looks like this:

    JobID  Username Queue Jobname SessID NDS TSK Memory Time  Status 
    ----------------------------------------------------------------
    717.XXXXXX  user XXXX       SS  2323283 1  24  122gb --     E   

Thanks in advance


Solution

  • There is a script here that does this (for SGE). I started to excerpt just the relevant parts for you, but it will probably be easier for you to start with the full script and just insert the qsub commands inside the submit_job function, and then put the code you want for copying the results after the wait_job_finish command in the script. You can remove the log printing at the end if you want.

    #!/bin/bash
    
    # this script will submit a qsub job and check on host information for the cluster
    # node which it ends up running on
    # ~~~~~ CUSTOM FUNCTIONS ~~~~~ #
    submit_job () {
        local job_name="$1"
        qsub -j y -N "$job_name" -o :${PWD}/ -e :${PWD}/ <<E0F
    set -x
    hostname
    cat /etc/hosts
    python -c "import socket; print socket.gethostbyname(socket.gethostname())"
    # sleep 5000
    E0F
    }
    
    wait_job_start () {
        local job_id="$1"
        printf "waiting for job to start"
        while ! qstat | grep "$job_id" | grep -Eq '[[:space:]]r[[:space:]]'
        do
            printf "."
            sleep 1
        done
        printf "\n\n"
    
        local node_name="$(get_node_name "$job_id")"
        printf "Job is running on node $node_name \n\n"
    }
    
    wait_job_finish () {
        local job_id="$1"
        printf "waiting for job to finish"
        while qstat | grep -q "$job_id"
        do
            printf "."
            sleep 1
        done
        printf "\n\n"
    }
    
    check_for_job_submission () {
        local job_id="$1"
        if ! qstat | grep -q "$job_id" ; then
            echo "its there"
        else
            echo "not there"
        fi
    }
    
    get_node_name () {
        local job_id="$1"
        qstat | grep "$job_id" | sed -e 's|^.*[[:space:]]\([a-zA-Z0-9.]*@[^ ]*\).*$|\1|g'
    }
    # ~~~~~ RUN ~~~~~ #
    printf "Submitting cluster job to get node hostname and IP\n\n"
    
    job_name="get_node_hostnames"
    job_id="$(submit_job "$job_name")" # Your job 832606 ("get_node_hostnames") has been submitted
    job_id="$(echo "$job_id" | sed -e 's|.*[[:space:]]\([[:digit:]]*\)[[:space:]].*|\1|g' )"
    job_stdout_log="${job_name}.o${job_id}"
    
    printf "Job ID:\t%s\nJob Name:\t%s\n\n" "$job_id" "$job_name"
    
    wait_job_start "$job_id"
    wait_job_finish "$job_id"
    
    printf "\n\nReading log file ${job_stdout_log}\n\n"
    [ -f "$job_stdout_log" ] && cat "$job_stdout_log"
    printf "\n\nRemoving log file ${job_stdout_log}\n\n"
    [ -f "$job_stdout_log" ] && rm -f "$job_stdout_log"
    

    Sidenote: If you like Python, there is a slightly more robust equivalent here

    You'll probably have to do some little tweaks to both to adjust it for your PBS system, since this was written for SGE.