BACKGROUND: I have to adapt my Snakemake pipeline from a single-node usage to a cluster with resource management. With a SLURM-specific Snakemake profile, my rules are successfully submitted as SLURM jobs, so I continued to add the Snakemake directive resources
to every non-local rule to optimize queue scheduling. These settings were adopted and my pipeline finished as intended.
EXAMPLE:
rule ruleA:
group: "group_1_init"
resources:
cpus=1,
time="00:04:00"
rule ruleB:
group: "group_1_init"
resources:
cpus=1,
time="00:05:00"
ruleA
and ruleB
are submitted as a single job to a computing node.
PROBLEM: My pipeline has many small, single-CPU jobs that I binned with the Snakemake rule directive group
. Here is the error:
WorkflowError:
Failed to group jobs together. Resource time is a string but not all group jobs require the same value. Observed: 00:05:00 != 00:04:00.
I guess, there should be only one resource
setting per group but I could not find online resources on the logic behind it.
QUESTION: How do I define my varying resource requirements in group jobs? Should e.g. time
reflect the computing time of the whole group of jobs or does Snakemake sum-up rule times within a group as a parameter for the SLURM job submission. For cpus
in turn, that would be the max cpus among all rules.
Without parameter conversion by an additional script, the resource time
has to be an integer and is interpreted as minutes. This is the correction:
rule ruleA:
group: "group_1_init"
resources:
cpus=1,
time=4
rule ruleB:
group: "group_1_init"
resources:
cpus=1,
time=5