Search code examples
pythonwildcardsnakemake

Is it possible in Snakemake to have "optional" wildcard or part of the filename?


I wonder if it is possible to join these two rules in Snakemake in a single rule (they do the same in the "run:"):

rule without_d:
    input:
        vals_pca    = 'stats/input_{type}.npz',
    output:
        for_cnv     = 'stats/output_{type, tt|gs}.npz'
    run:
        # DO STUFF

rule with_d:
    input:
        vals_pca    = 'stats/input_{type}_d{amount}.npz',
    output:
        for_cnv     = 'stats/output_{type, tt|gs}_d{amount}.npz'
    run:
        # DO STUFF

I have tried to define stats/output_{type}{amount}.npz, but wildcards apparently do not match an empty string. The second idea was to put it into "or" like stats/{output, output_{type}|output_{type}_d{amount}}, but here the problem is with a wildcard in a wildcard.

Thanks!


Solution

  • Thanks to Eric I realized that regular expressions can be used in snakemake rules' wildcards. The only problem is then that snakemake does not accept "empty" wildcards, but this can be overridden as explained in the post here: https://groups.google.com/g/snakemake/c/S7fTL4jAYIM?pli=1 Thus the solution to my problem is as follows:

    rule both:
        input:
            vals_pca    = 'stats/input_{type}{amount}.npz',
        output:
            for_cnv     = 'stats/output_{type, tt|gs}{amount, .{0}|_d.+}.npz'
        run:
            # DO STUFF
    

    Snakemake will match {amount, .{0}|_d.+} as an empty string or string beginning with _d. Hope it will help somebody.