Search code examples
sungridengineqsubstarcluster

Ensuring one Job Per Node on StarCluster / SunGridEngine (SGE)


When qsubing jobs on a StarCluster / SGE cluster, is there an easy way to ensure that each node receives at most one job at a time? I am having issues where multiple jobs end up on the same node leading to out of memory (OOM) issues.

I tried using -l cpu=8 but I think that does not check the number of USED cores just the number of cores on the box itself.

I also tried -l slots=8 but then I get:

Unable to run job: "job" denied: use parallel environments instead of requesting slots explicitly.

Solution

  • In your config file (.starcluster/config) add this section:

    [plugin sge]
    setup_class = starcluster.plugins.sge.SGEPlugin
    slots_per_host = 1