Search code examples
pythonlistsnakemake

snakemake wrapper: pointing wildcard to a list either config.yaml or txt


I am trying to use the following snakemake wrapper:

rule get_fastq_pe:
    output:
        # the wildcard name must be accession, pointing to an SRA number
        "data/{accession}_1.fastq",
        "data/{accession}_2.fastq"
    params:
        # optional extra arguments
        extra=""
    threads: 6  # defaults to 6
    wrapper:
        "0.73.0/bio/sra-tools/fasterq-dump"

How would you direct accession to multiple SRR accessions either in a txt file or config.yaml file?


Solution

  • First, specify all the SRR you need in a yaml file SRR.yml:

    SRR:
      - SRR1234
      - SRR5678
      - SRR2468
      - SRR1357
    

    Then in your Snakefile, load the yaml file with the keyword configfile::

    configfile: "SRR.yml"
    

    define a rule all to trigger the creation of all necessary files:

    rule all:
        input: expand("data/{accession}_{RF}.fastq", accession=config["SRR"], RF=["1","2"])
    

    then add your rule:

    rule get_fastq_pe:
        output:
            # the wildcard name must be accession, pointing to an SRA number
            "data/{accession}_1.fastq",
            "data/{accession}_2.fastq"
        params:
            # optional extra arguments
            extra=""
        threads: 6  # defaults to 6
        wrapper:
            "0.73.0/bio/sra-tools/fasterq-dump"