Search code examples
lsf

Why do I have many more jobs `started` than running or suspended?


According to the bqueues manual page:

STARTED
         Number of job slots used by running or
         suspended jobs owned by users or user groups in
         the queue.

According to bqueues, I have 369 jobs started:

$ bqueues -r lotus | egrep '(STARTED|gholl)'
 USER/GROUP   SHARES  PRIORITY  STARTED  RESERVED  CPU_TIME  RUN_TIME   ADJUST
gholl          10       0.006    369        0   2334366.5   723589       0.000

But when I run bjobs, it only shows 24 jobs that are running or suspended:

$ bjobs | egrep '(RUN|SUSP)' | wc -l
24

What explains the discrepancy between 24 jobs running and 369 jobs started?


Solution

  • The number in STARTED refers to the number of slots. One job may take up more than one slot if it uses multiple threads. For example, if a job is submitted using bsubs with the flag -n 16, then each job will use 16 jobs. 23×16+1=368, so in the example above, user gholl has 23 jobs using 16 slots and 1 job using 1 slot.