I'm creating a pipeline to run RDtest for each chromosome. However,with chromosome X and Y I want to do it separately with different options, therefore I used wildcard_constraints of snakemake. But snakemake gives me the Missing input files error. Could someone help me to fix it?
rule all:
input:
expand(rules.RdTest_autosomes.output,source = ALGO,chrom = CONTIG_LIST),
rule RdTest_autosomes:
input:
bed = OUTPUT_DIR + "/GenerateBatchMetrics/All_Beds/SEP/" + BATCH + '.{chrom}.{source}.bed',
bincov = rules.ZPaste.output.matrix_file,
medmat = rules.T_CalcMedCov.output,
output:
metrics=OUTPUT_DIR + '/GenerateBatchMetrics/Metrics/' + BATCH + '.{source}.{chrom}.metrics',
wildcard_constraints:
chrom='(' + '|'.join(AUTOSOMAL) + ')' # <- from chr1 to chr22
params:
prefix = BATCH + '.{source}.{chrom}',
famfile = config['base']['fam_file'],
sample = OUTPUT_DIR + "/sample_list.txt",
op_dir = OUTPUT_DIR + '/GenerateBatchMetrics/Metrics/',
singularity:
"sif_images/rdtest.sif"
threads: workflow.cores * 0.4
shell:
"""
Rscript src/RdTest/RdTest.R \\
-b {input.bed} \\
-n {params.prefix} \\
-o {params.op_dir} \\
-c {input.bincov} \\
-m {input.medmat} \\
-f {params.famfile} \\
-w {params.sample}
touch {output}
"""
With your wildcard_constraints
the rule can process only the AUTOSOMAL contigs and consequently you get missing file errors for X and Y. You could write a rule RdTest_sexchroms
similar to RdTest_autosomes
but constrained to X and Y and with the parameters appropriate to the sex chromosomes. However, I think it would be better to remove the constraints altogether and use parameters dependent on the value of {chrom}
. Somehting like:
rule RdTest_autosomes:
input:
...
output:
...
params:
arg1=lambda wc: 'some-value' if wc.chrom in AUTOSOMAL \
else 'another-value' if wc.chrom in ['X', 'Y'] \
else None,
... other params
shell:
"""
Rscript src/RdTest/RdTest.R \\
-p {params.arg1} \\
...
"""