When qsub
ing jobs on a StarCluster / SGE cluster, is there an easy way to ensure that each node receives at most one job at a time? I am having issues where multiple jobs end up on the same node leading to out of memory (OOM) issues.
I tried using -l cpu=8
but I think that does not check the number of USED cores just the number of cores on the box itself.
I also tried -l slots=8
but then I get:
Unable to run job: "job" denied: use parallel environments instead of requesting slots explicitly.
In your config file (.starcluster/config) add this section:
[plugin sge]
setup_class = starcluster.plugins.sge.SGEPlugin
slots_per_host = 1