Search code examples
pythonsnakemake

How to fix this "IndexError: list index out of range" in snakemake


I am setting up a new snakemake pipeline for the first time and running into an issue with the code.

I have tried to make it really simple in the beginning.


configfile: "config.yaml"
SAMPLES, = glob_wildcards("data/{sample}_L008_R1_001.fastq.gz")

rule all:
    input:
        expand("umi_labeled_fastq/{sample}.umi-extract.fq.gz", sample=SAMPLES)
rule umi_tools_extract:
    input:
        "data/{sample}_L008_R1_001.fastq.gz"
    output:
        "umi_labeled_fastq/{sample}.umi-extract.fq.gz"
    shell:
        "umi_tools extract --extract-method=regex --bc-pattern=”(?P<umi_1>.{6})(?P<discard_1>.{4}).*” -I {input} -S {output}"

here is the output I receive:

Job counts:
    count   jobs
    1   all
    6   umi_tools_extract
    7

[Thu May 16 16:55:05 2019]
rule umi_tools_extract:
    input: data/YL5_S221_L008_R1_001.fastq.gz
    output: umi_labeled_fastq/YL5_S221.umi-extract.fq.gz
    jobid: 3
    wildcards: sample=YL5_S221

RuleException in line 9 of /home/ryan/lexogen/test2.snakefile:
IndexError: list index out of range

If I remove this part from the regex pattern then I get no error:

--bc-pattern=”(?P<umi_1>.{6})(?P<discard_1>.{4}).*”

then I get no error. How do I get around this?


Solution

  • You need to escape braces for {4} and {6} in your shell command by doubling the brackets. Snakemake thinks they are variables of some type when they are not and hence the error.

    shell:
        "umi_tools extract --extract-method=regex --bc-pattern=”(?P<umi_1>.{{6}})(?P<discard_1>.{{4}}).*” -I {input} -S {output}"