Search code examples
wildcardsnakemake

snakemake error: 'Wildcards' object has no attribute 'batch'


I don't understand how to redefine my snakemake rule to fix the Wildcards issue below.

Ignore the logic of batches, it internally makes sense in the python script. In theory, I want the rule to be run for each batch 1-20. I use BATCHES list for {batch} in output, and in the shell command, I use {wildcards.batch}:


OUTDIR="my_dir/"
nBATCHES = 20
BATCHES = list(range(1,21)) # [1,2,3 ..20] list

[...]

rule step5:
    input:
        ids = expand('{IDLIST}', IDLIST=IDLIST)
    output:
        type1 = expand('{OUTDIR}/resources/{batch}_output_type1.csv.gz', OUTDIR=OUTDIR, batch=BATCHES),
        type2 = expand('{OUTDIR}/resources/{batch}_output_type2.csv.gz', OUTDIR=OUTDIR, batch=BATCHES),
        type3 = expand('{OUTDIR}/resources/{batch}_output_type3.csv.gz', OUTDIR=OUTDIR, batch=BATCHES)
    shell:
        "./some_script.py --outdir {OUTDIR} --idlist {input.ids}  --total_batches {nBATCHES} --current_batch {wildcards.batch}"

Error:

RuleException in rule step5  in line 241 of Snakefile:

AttributeError: 'Wildcards' object has no attribute 'batch', when formatting the following:

./somescript.py --outdir {OUTDIR} --idlist {input.idlist}  --totalbatches {nBATCHES} --current_batch {wildcards.batch}

Executing script for a single batch manually looks like this (and works): (total_batches is a constant; current_batch is supposed to iterate)

./somescript.py --outdir my_dir/ --idlist ids.csv --total_batches 20 --current_batch 1


Solution

  • You seem to want to run the rule step5 once for each batch in BATCHES. So you need to structure your Snakefile to do exactly that.

    In the following Snakefile running the rule all runs your rule step5 for all combinations of OUTDIR and BATCHES:

    OUTDIR = "my_dir"
    nBATCHES = 20
    BATCHES = list(range(1, 21))  # [1,2,3 ..20] list
    IDLIST = ["a", "b"] # dummy data, I don't have the original
    
    rule all:
        input:
            type1=expand(
                "{OUTDIR}/resources/{batch}_output_type1.csv.gz",
                OUTDIR=OUTDIR,
                batch=BATCHES,
            ),
    
    
    rule step5:
        input:
            ids=expand("{IDLIST}", IDLIST=IDLIST),
        output:
            type1="{OUTDIR}/resources/{batch}_output_type1.csv.gz",
            type2="{OUTDIR}/resources/{batch}_output_type2.csv.gz",
            type3="{OUTDIR}/resources/{batch}_output_type3.csv.gz",
        shell:
            "./some_script.py --outdir {OUTDIR} --idlist {input.ids}  --total_batches {nBATCHES} --current_batch {wildcards.batch}"
    

    In your earlier version {batches} was just an expand-placeholder, but not a wildcard and the rule was only called once.

    Instead of the rule all, this could be a subsequent rule which uses one or multiple of the outputs generated from step5.