From my understanding of the Snakemake documentation, if I annotate the benchmark file of a rule with repeat it should run the rule a specific number of times in order to get an idea about the variability of the timing and memory measurements. However, when I look at the benchmark file of the rule that I am trying to profile, only the first measurement recorded seems accurate and the preceding measurements are quite small as if the rule did not really get rerun. Is my understanding of the repeat annotation incorrect or is this a bug with Snakemake?
This is the rule from my Snakemake file that I am trying to profile with the repeat annotation.
rule br:
input:
dicz = "output/br/{prefix}/{ref}/{prefix}.{ref}.br.dicz",
parse = "output/br/{prefix}/{ref}/{prefix}.{ref}.br.parse",
script = config["script"]
benchmark: repeat("benchmarks/br/{prefix}.{ref}.br.script.benchmark.txt", 3),
threads: 32,
params:
outprefix = "output/br/{prefix}/{ref}/{prefix}.{ref}.br"
output:
C = "output/br/{prefix}/{ref}/{prefix}.{ref}.br.C",
R = "output/br/{prefix}/{ref}/{prefix}.{ref}.br.R",
log = "output/br/{prefix}/{ref}/{prefix}.{ref}.br.log"
shell:
"""
module load gcc
{input.script} {params.outprefix} -t {threads}
"""
This is a benchmark file produced from this rule
s h:m:s max_rss max_vms max_uss max_pss io_in io_out mean_load cpu_time
468.4567 0:07:48 12440.35 12493.91 12432.10 12435.02 2039.47 2441.37 85.57 427.32
1.8935 0:00:01 15.63 105.46 9.15 10.48 0.00 0.00 9.00 0.26
0.9830 0:00:00 13.86 103.52 7.23 8.61 0.00 0.00 0.00 0.16
Consider what your script does upon rerunning. If the results from the first run still exist, you may want to remove them. The problem here is you lose your output so maybe copy them to a new directory.
rule br:
input:
dicz = "output/br/{prefix}/{ref}/{prefix}.{ref}.br.dicz",
parse = "output/br/{prefix}/{ref}/{prefix}.{ref}.br.parse",
script = config["script"]
benchmark: repeat("benchmarks/br/{prefix}.{ref}.br.script.benchmark.txt", 3),
threads: 32,
params:
outprefix = "output/br/{prefix}/{ref}/{prefix}.{ref}.br"
output:
C = temp("output/br/{prefix}/{ref}/{prefix}.{ref}.br.C"),
R = temp("output/br/{prefix}/{ref}/{prefix}.{ref}.br.R"),
log = temp("output/br/{prefix}/{ref}/{prefix}.{ref}.br.log")
shell:
"""
module load gcc
{input.script} {params.outprefix} -t {threads}
"""