Search code examples
pythonamazon-s3snakemake

Snakemake Local-Only


My objective is simple: I'd like to keep certain rules local-only and not upload output to our Amazon S3 Bucket.

Inside the documentation, I see keep_local=True, which keeps remote files on the local drive after processing. However, this isn't what I'm looking for, as this doesn't prevent the rule from uploading the output to Amazon S3.

Snakemake currently acts like a mirror between Amazon S3 and my local drive.

For reference, this is how we've been setting up Amazon S3 with Snakemake.

# run command

snakemake --default-remote-provider S3 --default-remote-prefix '$s3' --use-conda --cores 32 --rerun-incomplete --printshellcmds

# inside Snakemake

S3 = S3RemoteProvider(access_key_id=config["s3_params"]["access_key_id"], secret_access_key=config["s3_params"]["secret_access_key"])

# example of rule all

#Runs all rules
rule all:
  input:
    expand(["{sample}.demultiplex_fastqc.zip",
            "{sample}.demultiplex_fastqc.html"],
            sample=samples["sample"]),

    expand(["{sample}.adapterTrim.round2.rmRep.metrics"],
            sample=samples["sample"])

# etc...

Solution

  • There are at least two options:

    1. continue using snakemake --default-remote-provider S3 --default-remote-prefix '$s3' and wrap the files that should be kept locally with local(some_file):
    rule some_rule:
        output:
            local("my_file.txt") # will not be uploaded to s3
    
    1. use snakemake without s3 as default provider and explicitly wrap remote files with S3.remote:
    from snakemake.remote.S3 import RemoteProvider as S3RemoteProvider
    S3 = S3RemoteProvider()
    
    rule some_rule:
        output:
            S3.remote("my_file.txt") # will be uploaded to S3