Search code examples
bioinformaticssnakemake

Run rule in Snakemake only if another rule fails, for the specific samples that it failed for?


I'm running a metagenomics pipeline in Snakemake. I am running MetaSPAdes for my assemblies, but it's not uncommon that MetaSPAdes will often fail for particular samples. If MetaSPAdes fails, I want to run MEGAHIT on only the samples that it failed for. Is there any way to create this sort of rule dependancy in Snakemake?

For example:

  1. generate a particular file if a rule fails (in this case, assembly with MetaSPAdes). I suppose this would mean that the output of the MetaSPAdes rule needs to be either the contigs, or a "this failed" output file. This would help Snakemake recognize not to re-run this rule.
  2. create a list of samples that the rule failed for, and
  3. run a different rule only on this list of samples with failed MetaSPAdes assemblies (in this case, run MEGAHIT instead on those samples).

Has anyone figured out an elegant way to do something like this?


Solution

  • I'm not familiar with the programs you mention but I think you don't need separate rules for what you need. You can write a single rule that tries to run metaspades first and if it fails try megahit. For example:

    rule assembly:
        input:
            '{sample}.in',
        output:
            '{sample}.out',
        run:
            import subprocess
    
            p = subprocess.Popen("MetaSPAdes {input} > {output}", shell= True, stderr= subprocess.PIPE, stdout= subprocess.PIPE)
            
            stdout, stderr= p.communicate()
            
            if p.returncode != 0:
                shell("megahit {input} > {output}")
    

    stdout, stderr= p.communicate() captures the stderr, stdout and return code of the process. You can analyse stderr and/or the returncode to decide what to do next. You probably need something more than the above but hopefully the idea is about right.