Search code examples
mpicluster-computinghpclsf

LSF: about requesting nodes, exclusively selecting nodes and running mpirun


I am very confused about submitting a job on a multi-user cluster environment. I use a script with the following head

#BSUB -L /bin/bash
#BSUB -n 10
#BSUB -J jobname
#BSUB -oo log/output.%J
#BSUB -eo log/error.%J
#BSUB -q queue_name
#BSUB -P project_name
#BSUB -R "span[ptile=12]"
#BSUB -W 2:0

mpirun ./someexecutable

In my intention, this jobs should run on 10 processors (cores) and span 1 entire node (because each node on the machine has 12 cores), so the node is fully ised by me and no other user interfere on my node. I have explicitly checked and it looks like my code is using 10 cores at runtime.

Now I am talking with somebody and they are telling me that in this way I am actually asking for 120 cores. I think this is not right but maybe I have misunderstood the instructions

https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_admin/span_string.html

Shall I use instead?

#BSUB -R "span[hosts=1]" 

Solution

  • In my intention, this jobs should run on 10 processors (cores) and span 1 entire node

    Yes, you want to use

    #BSUB -n 10
    #BSUB -R "span[hosts=1]"
    

    Which means put the job on only 1 host.

    and no other user interfere on my node

    You can get exclusive access to the host with

    #BSUB -x
    

    FYI. You can think of

    #BSUB -R "span[ptile=x]"
    

    as, put at most x slots on a single host.