Search code examples
pythonlinuxsnakemake

Apply snakemake rule on all generated files


I want to run a simple script "script.py", which will run some caculayions and periodically spit out a step_000n.txt file with n being dependent on the total file execution time. I would then like snakemake to run another rule on all generated files. What would be the proper Snakefile input? ie

1. run scipt.py
2. get step_000{1,2,3,4 ..}.txt (n being variable and not determined)
3. apply `process.py -in step_000{n}.txt -out step_000{n}.png` on all step_000{1,2,3,4 ..}.txt

My obviously wrong attempt is below


rule all:
    input: expand("{step}.png", step=list(map(lambda x: x.split(".")[0], glob.glob("model0*.txt"))))

rule txt:
    input: "{step}.txt"
    output: "{step}.png"
    shell:
        "process.py -in {input} -out {output}"

rule first:
    output: "{step}.txt"
    script: "script.py"

I could not figure out how to define output target here.


Solution

  • I would write all the step_000n.txt files to a dedicated directory and then process all the files in that directory. Something like:

    rule all:
        input:
            'processed.txt',
    
    
    rule split:
        output:
            directory('processed_dir'),
        shell:
            r"""
            # Write out step_001.txt, step_002.txt, ..., step_000n.txt
            # in output directory `processed_dir`
            mkdir {output}
            script.py ...
            """
    
    
    rule process:
        input:
            indir= 'processed_dir',
        output:
            out= 'processed.txt',
        shell:
            r"""
            process.py -n {input.indir}/step_*.txt -out {output.out}
            """