I am trying to run qsub jobs on a SGE(Sun Grid Engine) cluster that supports a maximum of 688 jobs. I would like to know if there is any way to find out the total number of jobs that are currently running on the cluster so I can submit jobs based on the current cluster load.
I plan to do something like: sleep for 1 minute and check again if the number of jobs in the cluster is < 688 and then submit jobs further.
And just to clarify my question pertains to knowing the total number of jobs submitted on the cluster not just the jobs I have submitted currently.
Thanks in advance.
You can use qstat
to list the jobs of all users; this with awk
and wc
can be used to find out the total number of jobs on the cluster:
qstat -u "*" | awk '{if ($5 == "r" || $5 == "qw") print $0;}' | wc -l
The above command also takes into account jobs that are queued and waiting to be scheduled on a compute node.
However, the cluster sysadmins could disallow users to check on jobs that don't belong to them. You can verify if you can see other user's jobs by running:
qstat -u "*"
If you know for a fact that another user is running a job and yet you can't see it while running the above command, it's most likely that the sys admins disabled that option.
Afterthought: from my understanding, you're just a regular cluster user - why are you even bothering to submit jobs this way. Why don't you just submit all the jobs that you want and if the cluster can't schedule your jobs, it will just put them in a qw
state and schedule them whenever SGE feels is the most appropriate time.