Search code examples
snakemake

How to get a rule that would work the same on a directory and its sub-directories


I am trying to make a rule that would work the same on a directory and any of its sub sub-directory (to avoid having to repeat the rule several times). I would like to have access to the name of the subdirectory if there is one.

My approach was to make the sub-directory optional. Given that wildcards can be made to accept an empty string by explicitly giving the ".*" pattern, I therefore tried the following rule:

rule test_optional_sub_dir:
    input:
        "{adir}/{bdir}/a.txt"
    output:
        "{adir}/{bdir,.*}/b.txt"
    shell:
        "cp {input} {output}"

I was hoping that this rule would match both A/b.txt and A/B/b.txt.

However, A/b.txt doesn't match the rule. (Neither does A//b.txt which would be the litteral omission of bdir, I guess the double / gets removed before the matching happens).

The following rule works with both A/b.txt and A/B/b.txt:

rule test_optional_sub_dir2:
    input:
        "{path}/a.txt"
    output:
        "{path,.*}/b.txt"
    shell:
        "cp {input} {output}"

but the problem in this case is that I don't have easy access to the name of the directories in path. I could use the function pathlib.Path to break {path} up but this seems to get overly complicated.

Is there a better way to accomplish what I am trying to do?

Thanks a lot for your help.


Solution

  • How exactly you want to use the sub-directory in your rule might determine the best way to do this. Maybe something like:

    def get_subdir(path):
        dirs = path.split('/')
        if len(dirs) > 1:
            return dirs[1]
        else:
            return ''
    
    rule myrule:
        input:
            "{dirpath}/a.txt"
        output:
            "{dirpath}/b.txt"
        params:
            subdir = lambda wildcards: get_subdir(wildcards.dirpath)
        shell:
            #use {params.subdir}
    

    Of course, if your rule uses "run" or "script" instead of "shell" you don't even need that function and the subdir param, and can just figure out the subdir from the wildcard that gets passed into the script.