Search code examples
pythonsnakemakedirected-acyclic-graphsorchestration

Prevent rules from rerunning when intermediate file is updated


Let's say I have two rules in my snakemake file

  1. The first rule fetches a remote file and makes a temporary local copy
  2. The second rule uses the local file and performs an expensive task

Now lets say I ran this pipeline to completion and I wanted to add a third rule and re-run the pipeline.

  1. The third rule uses the same local file and performs a different task

Is there a way I can run this updated pipeline without rerunning rule #2? The issue is that when I attempt to complete rule #3, rule #1 is triggered and then rule #2 wants to re-run because the intermediate local file has been updated.

I know that techniques like using touch or ancient exist, but I'm not sure how or even if they can apply here. Is there a way to specifically tag rule #1 as not making an update?


Solution

  • Wrapping the input files for rules 2 and 3 in ancient should prevent them from reacting to file updates. Something like this:

    rule a:
         output: 'a.txt'
         shell: 'curl some_url > {output}'
    
    rule b:
         input: ancient('a.txt')
         # do something
    
    rule c:
         input: ancient('a.txt')
         # do something