Search code examples

SGE submitted job state doesn't change from "qw"

I'm using Sun Grid Engine on ubuntu 14.04 to queue my jobs to be run on a multicore CPU. I've installed and set up SGE on my system. I created a "hello_world" dir which contains two shell scripts namely "" & "", first one including a simple command and second one including qsub command to submit the first script file as a job to be run. Here's what "" includes:


echo "Hello world" > /home/theodore/tmp/hello_world/hello_world_output.txt

And here's what "" includes:


qsub \
  -e /home/hello_world/hello_world_qsub.error \
  -o /home/hello_world/hello_world_qsub.log \

after giving permission to the second sh file and running it with "./" command from the specified dir, the output is reasonable:

Your job 1 ("") has been submitted

But the output of "qstat" command is frustrating:

    job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
     1 0.50000 hello_worl mhr          qw    05/16/2016 20:26:23                                    1        

And the "state" column always remains on "qw" and never changes to "r".

Here's the output of "qstat -j 1" command:

job_number:                 1
exec_file:                  job_scripts/1
submission_time:            Mon May 16 20:26:23 2016
owner:                      mhr
uid:                        1000
group:                      mhr
gid:                        1000
sge_o_home:                 /home/mhr
sge_o_log_name:             mhr
sge_o_path:                 /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
sge_o_shell:                /bin/bash
sge_o_workdir:              /home/mhr/hello_world
sge_o_host:                 localhost
account:                    sge
stderr_path_list:           NONE:NONE:/home/hello_world/hello_world_qsub.error
mail_list:                  mhr@localhost
notify:                     FALSE
stdout_path_list:           NONE:NONE:/home/hello_world/hello_world_qsub.log
jobshare:                   0
script_file:                ./
scheduling info:            queue instance "mainqueue@localhost" dropped because it is temporarily not available
                        All queues dropped because of overload or full

And here's the output of "qhost" command:

global                  -               -     -       -       -       -       -
localhost               -               -     -       -       -       -       -

What should I do to make my jobs run and finish their task?


  • From your qhost output, it looks like your machine "localhost" is properly configured in SGE. However, on "localhost" sge_execd is either not running or not configured properly. If it were, qhost would report statistics for "localhost".