I am struggling with a very simple thing. On input of my snakemake pipeline I would like to have a directory, list its content, and process each file from that directory in parallel. Naively I thought something like this should work:
rule all:
input:
"in/{test}.txt"
output:
"out/{test}.txt"
shell:
"echo {input} >> {output}"
This ends with the error
WorkflowError:
Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.
All the resources I could find start with hard-coding the list of jobs in the script, which is something I want to avoid to keep the pipeline generic. The idea is to just point the pipeline to a directory with a list of files and let it do its job. Is this possible? Seems fairly simple and intuitive, but couldn't find an example showing that.
I don't know what command you used for this rule, but the following workflow should suffice your purpose
rule all:
input:
expand("out/{prefix}.txt", prefix=glob_wildcards("in/{test}.txt").test)
rule test:
input:
"in/{test}.txt"
output:
"out/{test}.txt"
shell:
"echo {input} >> {output}"
glob_wildcards
is a function by snakemake to find out all the files that match the specified pattern (in/{test}.txt
in this case), then .text
is to get the list of strings that match {test}
in filenames (example: "ab" in "in/ab.txt").
Then expand
can fill the string to the placeholder variable that wrapped by curly bracket, then generate a list of input file names.
So rule all
wants a list of input files correspond to all txt files in in
folder, then it would let snakemake execute rule test
for every file