Search code examples
snakemake

Snakemake: MissingInputException with inconsistent naming scheme


I am trying to process MinION cDNA amplicons using Porechop with Minimap2 and I am getting this error.

MissingInputException in line 16 of /home/sean/Desktop/reo/antisera project/20200813/MinIONAmplicon.smk:
Missing input files for rule minimap2:
8413_19_strict/BC01.fastq.g

I understand what the error telling me, I just understand why its being its not trying to make the rule before it. Porechop is being used to check for all the possible barcodes and will output more than one fastq file if it finds more than barcode in the directory. However since I know what barcode I am looking for I made a barcodes section in the config.yaml file so I can map them together.

I think the error is happening because my target output for Porechop doesn't match the input for minimap2 but I do not know how to correct this problem as there can be multiple outputs from porechop.

I thought I was building a path for the input file for the minimap2 rule and when snakemake discovered that the porechop output was not there it would make it, but that is not what is happening.

Here is my pipeline so far,

configfile: "config.yaml"
rule all:
    input:
        expand("{sample}.bam", sample = config["samples"])
rule porechop_strict:
    input:
        lambda wildcards: config["samples"][wildcards.sample]
    output:
        directory("{sample}_strict/")
    shell:
        "porechop -i {input} -b {output} --barcode_threshold 85 --threads 8 --require_two_barcodes"
rule minimap2:
    input:
        lambda wildcards: "{sample}_strict/" + config["barcodes"][wildcards.sample]
    output:
        "{sample}.bam"
    shell:
        "minimap2 -ax map-ont -t8 ../concensus.fasta {input} | samtools sort -o {output}"

and the yaml file

samples: {
  '8413_19': relabeled_reads/8413_19.raw.fastq.gz,
  '8417_19': relabeled_reads/8417_19.raw.fastq.gz,
  '8445_19': relabeled_reads/8445_19.raw.fastq.gz,
  '8466_19_104': relabeled_reads/8466_19_104.raw.fastq.gz,
  '8466_19_105': relabeled_reads/8466_19_105.raw.fastq.gz,
  '8467_20': relabeled_reads/8467_20.raw.fastq.gz,
  }
barcodes: {
      '8413_19': BC01.fastq.gz,
      '8417_19': BC02.fastq.gz,
      '8445_19': BC03.fastq.gz,
      '8466_19_104': BC04.fastq.gz,
      '8466_19_105': BC05.fastq.gz,
      '8467_20': BC06.fastq.gz,
    }

Solution

  • So I am not 100% sure why this way works, I imagine it has to do with the way snakemake looks at the targets however here is the solution I found for it.

    rule minimap2:
        input:
            "{sample}_strict"
        params:
            suffix=lambda wildcards: config["barcodes"][wildcards.sample]
        output:
            "{sample}.bam"
        shell:
            "minimap2 -ax map-ont -t8 ../consensus.fasta\
             {input}/{params.suffix} | samtools sort -o {output}"
    

    by using the params feature in snakemake I was able to match up the correct barcode to the sample name. I am not sure why I could just do that as the input itself, but when I returned the input to the match the output of the previous rule it works.