I am building a snakemake workflow in which certain wildcards(populations) have extra steps not shared by all wildcards. I have 8 populations that run a pedigree-based evaluation, and 6 of these 8 populations run, in addition to the pedigree evaluation, a genomic evaluation. My workflow includes a Python script that only generates a genotype file in the case of a population in the genomic workflow. A summary of the issue is given below. The population CHA runs with the genomic workflow, and the population BEL works with the pedigree-based workflow. In the case of the BEL wildcard, the Python script produces the [dlistAnim, phen_file]
files, and in the case of the CHA wildcard, the Python script produces [dlistAnim, phen_file, gen_file]
.
genomic_breeds = {"CHA": "CHAROLAIS"}
breeds = {"CHA": "CHAROLAIS", "BEL":"BELGIAN BLUE"}
rule extract_phenotype_data:
input:
params:
config = "../config_file.yml",
breed =f"{{breed}}"
output:
dlistAnim=f"../listcodeall{{breed}}.txt",
phen_file=f"../phen_{{breed}}.txt",
gen_file=f"../genotypes_{{breed}}.txt"
run:
cmd = f"python /../extract_phenotype_data_for_populations.py --config {params.config} --breed {breeds[params.breed]}"
shell(cmd)
The file gen_file
is required by the steps after the pedigree-based evaluation that should run for only the genomic breeds (CHA) wildcards.
I have tried the dynamic file command, however, I run into a bug that refers me to https://github.com/snakemake/snakemake/issues/823.
I would expect a workflow that runs for all wildcards to a certain level and then continues for a subset of the wildcards to the end. Additionally, the workflow should account for the files that may not be present in the pedigree-based workflow.
The snakemake version is 7.25.0
I found a solution to this question by using two rules in combination with the wildcard constraints directive.
The first rule produces two outputs and only works on wildcards not in the genomic_breeds
dictionary.
rule extract_phenotype_data:
wildcard_constraints:
breed='|'.join([breed for breed in breeds.keys() if not breed
in genomic_breeds])
input:
params:
config = "../config_file.yml",
breed =f"{{breed}}"
output:
dlistAnim=f"../listcodeall{{breed}}.txt",
phen_file=f"../phen_{{breed}}.txt",
run:
cmd = f"python /../extract_phenotype_data_for_populations.py
--config {params.config} --breed {breeds[params.breed]}"
shell(cmd)
The second rule uses the wildcard constraints directive as well and only works on wildcards in the genomic_breeds
dictionary.
rule extract_phenotype_data_genomic:
wildcard_constraints:
breed='|'.join(list(genomic_breeds.keys()))
input:
params:
config = "../config_file.yml",
breed =f"{{breed}}"
output:
dlistAnim=f"../listcodeall{{breed}}.txt",
phen_file=f"../phen_{{breed}}.txt",
gen_file=f"../genotypes_{{breed}}.txt"
run:
cmd = f"python /../extract_phenotype_data_for_populations.py --config {params.config} --breed {breeds[params.breed]}"
shell(cmd)
The output of both rules are different in that only one of the rules produces the ../genotypes_{{breed}}.txt
file.
Rule All can be defined as
rule all:
input:
expand(f"../phen_{{breed}}.txt", breed=breeds.keys()),
expand(f"../listcodeall{{breed}}.txt", breed=breeds.keys()),
expand(f"../genotypes_{{breed}}.txt", breed =genomic_breeds.keys()),
Perhaps this helps someone someday.