I was wondering what the cleanest way to run two different shell commands depending on if a parameter is given in a snakemake config file. For the moment I am using this setup:
rule deeptools_bamCoverage_pe:
input:
bam="DATA/BAM/{sample_sp}_pe.bam",
bai="DATA/BAM/{sample_sp}_pe.bam.bai"
output:
"DATA/BIGWIG/{sample_sp}_pe_RPGC.bw"
log:
"snakemake_logs/deeptools_bamCoverage/{sample_sp}_pe.log"
run:
if config["excluded_regions"]:
shell("bamCoverage --normalizeUsing RPGC -bl "+config["excluded_regions"]+
" --effectiveGenomeSize $((2913022398-"+config["excluded_regions"].split(".")[0].split("_")[-1]+")) -e -b {input.bam} -o {output} 2>{log}")
else:
shell("bamCoverage --normalizeUsing RPGC --effectiveGenomeSize 2913022398 -e -b {input.bam} -o {output} 2>{log}")
My config file may or may not contain excluded_regions: DATA/GENOMES/DAC_excl_HG38_71570285.bed
depending on if the user wants to filter out some regions from the analysis. 71570285
is the total length of the excluded regions, which is needed for the normalisation calculations (this could be passed in the config file too but I am trying to keep it light to not scare away my potential users).
Is there a cleaner way to present snakemake with two different shell lines that it can run?
In general, not really. If the commands are completely different you are going to need an if/else statement. You could put the if/else statement within the shell code rather than using Python logic but that doesn't change much.
I'd say in this case the two commands are actually very similar and this version looks neater to me:
rule deeptools_bamCoverage_pe:
input:
bam="DATA/BAM/{sample_sp}_pe.bam",
bai="DATA/BAM/{sample_sp}_pe.bam.bai"
output:
"DATA/BIGWIG/{sample_sp}_pe_RPGC.bw"
log:
"snakemake_logs/deeptools_bamCoverage/{sample_sp}_pe.log"
params:
excluded_regions = config["excluded_regions"],
genome_size = 2913022398,
run:
if params.excluded_regions:
bl_file = "-bl " + config["excluded_regions"]
excluded_size = int(config["excluded_regions"].split(".")[0].split("_")[-1])
effective_genome_size = params.genome_size - excluded_size
else:
bl_file = ""
effective_genome_size = params.genome_size
shell("bamCoverage --normalizeUsing RPGC {bl_file} --effectiveGenomeSize {effective_genome_size} -e -b {input.bam} -o {output} 2>{log}")
But you if you find this less legible then stick with what you have - you're the one who has to debug/maintain your code.