I have a folder where the outputs of the rule are generated. I am having a real trouble running snakemake
with it. If I do not specify the outputs in rule all
, the rule (called neo4j
) is not run at all. If I try running it manually with snakemake neo4j
(which I would prefer not to), then I get an error:
WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.
I tried specifying the outputs of the rule in different ways but none of them worked.
Using expand
:
expand('results/neo4j/{sample}/cells.csv', sample=samples),
expand('results/neo4j/{sample}/genes.csv', sample=samples),
expand('results/neo4j/{sample}/cl_nodes.csv', sample=samples),
expand('results/neo4j/{sample}/cl_contains.csv', sample=samples),
expand('results/neo4j/{sample}/cl_isin.csv', sample=samples),
expand('results/neo4j/{sample}/expr_by.csv', sample=samples),
expand('results/neo4j/{sample}/expr_ess.csv', sample=samples)
Generates a very weird error for a completely different unrelated rule (called umap
):
Missing input files for rule umap: data_files/normalized/minus_2/cl_nodes.csv.csv
The path generation is completely messed up even though the folders are not connected in any way except for the results
being the root folder of all of the outputs.
Using dynamic
:
dynamic('results/neo4j/{sample}/cells.csv', sample=samples),
dynamic('results/neo4j/{sample}/genes.csv', sample=samples),
dynamic('results/neo4j/{sample}/cl_nodes.csv', sample=samples),
dynamic('results/neo4j/{sample}/cl_contains.csv', sample=samples),
dynamic('results/neo4j/{sample}/cl_isin.csv', sample=samples),
dynamic('results/neo4j/{sample}/expr_by.csv', sample=samples),
dynamic('results/neo4j/{sample}/expr_ess.csv', sample=samples)
Gives an error:
dynamic() got an unexpected keyword argument 'sample'
Ok, I tried removing sample=samples
but no luck
Just directory
:
directory('results/neo4j/{sample}/', sample=samples)
Gives error:
directory() got an unexpected keyword argument 'sample'
If I omit sample=samples
, not working either. If I specify directory
under rule all
output
, not working.
The rule I am having difficulty with is below:
rule neo4j:
input:
script = 'python/neo4j.py',
path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
path_to_umap = 'results/umap/{sample}_umap.csv',
path_to_mtx = 'data_files/normalized/{sample}.csv'
output:
base_neo4j = 'results/neo4j/{sample}'
shell:
"python {input.script} -path_to_cl {input.path_to_cl} -path_to_umap {input.path_to_umap} -path_to_mtx {input.path_to_mtx} -base_neo4j {output.base_neo4j}"
snakemake
version is 5.2.2
Any suggestions would be greatly appreciated.
Update
I modified the Snakemake
file using suggestions of Mali Akmanalp
and now rule all
looks like that:
samples,=glob_wildcards('data_files/normalized/{sample}.csv')
rule all:
input:
expand('results/pca/img/{sample}_pca.png', sample=samples),
expand('results/pca/{sample}_pca.csv', sample=samples),
expand('results/tsne/{sample}_tsne.csv', sample=samples),
expand('results/umap/{sample}_umap.csv', sample=samples),
expand('results/umap/img/{sample}_umap.png', sample=samples),
expand('results/tsne/img/{sample}_tsne.png', sample=samples),
expand('results/clusters/umap/{sample}_umap_clusters.csv', sample=samples),
expand('results/clusters/tsne/{sample}_tsne_clusters.csv', sample=samples),
expand('results/neo4j/{sample}/{file}', sample=samples,
file=['cells.csv', 'genes.csv', 'cl_contains.csv', 'cl_isin.csv', 'cl_nodes.csv', 'expr_by.csv', 'expr_ess.csv'])
and neo4j
rule like that:
rule neo4j:
input:
script = 'python/neo4j.py',
path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
path_to_umap = 'results/umap/{sample}_umap.csv',
path_to_mtx = 'data_files/normalized/{sample}.csv',
base_neo4j = 'results/neo4j/{sample}'
output: 'results/neo4j/{sample}/cells.csv', 'results/neo4j/{sample}/genes.csv', 'results/neo4j/{sample}/cl_nodes.csv',
'results/neo4j/{sample}/cl_contains.csv', 'results/neo4j/{sample}/expr_by.csv', 'results/neo4j/{sample}/expr_ess.csv',
'results/neo4j/{sample}/cl_isin.csv'
shell:
"python {input.script} -path_to_cl {input.path_to_cl} -path_to_umap {input.path_to_umap} -path_to_mtx {input.path_to_mtx} -base_neo4j {input.base_neo4j}"
With such set ups I am getting the error:
Missing input files for rule neo4j: results/neo4j/plus_1
Update
I removed this line from neo4j
rule: base_neo4j = 'results/neo4j/{sample}'
and then changed the output
of the rule to:
output:
cells = 'results/neo4j/{sample}/cells.csv',
genes = 'results/neo4j/{sample}/genes.csv',
cl_nodes = 'results/neo4j/{sample}/cl_nodes.csv',
cl_contains = 'results/neo4j/{sample}/cl_contains.csv',
cl_isin = 'results/neo4j/{sample}/cl_isin.csv',
expr_by = 'results/neo4j/{sample}/expr_by.csv',
expr_ess = 'results/neo4j/{sample}/expr_ess.csv'
and the shell
command:
shell:
"python {input.script} -path_to_cl {input.path_to_cl} -path_to_umap {input.path_to_umap} -path_to_mtx {input.path_to_mtx} -cells {output.cells} -genes {output.genes} -cl_nodes {output.cl_nodes} -cl_contains {output.cl_contains} -cl_isin {output.cl_isin} -expr_by {output.expr_by} -expr_ess {output.expr_ess}"
I do not like feeding in each parameter in the output
but it is not working otherwise. I tried feeding in just output
but it only feeds in the first item in the output
, others are ignored for some reason. I asked a separate question regarding that:
Snakemake passes only the first path in the output to shell command
Other than that, it is working now.
It's not very easy to diagnose the full issue since you haven't provided the whole Snakefile, but here is what I can infer from what you specified:
The error message is unfortunately a bit misleading, but the gist of it is that snakemake starts from a list of targets. These targets are either files you specified through the command line, or files that are the input of the topmost rule of a snakefile. Usually this rule is named "all" or "main". Here you would specify the final list of files to be generated by default. An example for your case would be:
rule all:
input: expand('results/neo4j/{sample}/{file}.csv', sample=samples, file=['cells.csv', 'genes.csv', ...])
rule neo4j:
...
output:'results/neo4j/{sample}/cells.csv', 'results/neo4j/{sample}/genes.csv'...
Snakemake looks at the input of main
to figure out all the files to be generated, then figures out what rule(s) to run (neo4j
) with which parameters, in order to generate those, and what rules to use to generate the inputs of those rules, etc etc. So at the end of the day the very last rule, i.e. the "target rule" all
is where everything starts, so you can't use wildcards there.
Notice that the output for neo4j
is just wildcards (they have {} in them and refer to a hypothetical pattern that may match a file), versus the input for all
is expanded to a concrete file names (like 'results/neo4j/123/cells.csv').
Often the way people get this error is that they don't have an all
rule on the top of their snakefile, which leads snakemake to pick whatever other rule is at the top as the target, which happens to be a rule that has a wildcard.
You probably shouldn't need dynamic / directory / etc for something like this.