Search code examples
pipelinesnakemakerna-seq

Snakemake report error:Missing input files for rulle all


I'm writing my RNA-seq pipeline using Snakemake. When I was writing the last part rule fpkm, which calculates fpkm value from bam files, I get the error:

MissingInputException in line 3 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Missing input files for rule all:
05_ft/wt2_transcript.gtf
05_ft/wt1_transcript.gtf
05_ft/wt2_gene.gtf
05_ft/epcr1_gene.gtf
05_ft/wt1_gene.gtf
05_ft/epcr2_transcript.gtf
05_ft/epcr1_transcript.gtf
05_ft/epcr2_gene.gtf

Here is my Snakefile:

SBT=["wt1","wt2","epcr1","epcr2"]

rule all:
    input:
        expand("02_clean/{nico}_1.paired.fq", nico=SBT),
        expand("02_clean/{nico}_2.paired.fq", nico=SBT),
        expand("03_align/{nico}.bam", nico=SBT),
        expand("04_exp/{nico}_count.txt", nico=SBT),
        expand("05_ft/{nico}_gene.gtf", nico=SBT),
        expand("05_ft/{nico}_transcript.gtf", nico=SBT)

rule trim:
    input:
        "01_raw/{nico}_1.fastq",
        "01_raw/{nico}_2.fastq"
    output:
        "02_clean/{nico}_1.paired.fq.gz",
        "02_clean/{nico}_1.unpaired.fq.gz",
        "02_clean/{nico}_2.paired.fq.gz",
        "02_clean/{nico}_2.unpaired.fq.gz",
    shell:
        "java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 {input[0]} {input[1]} {output[0]} {output[1]} {output[2]} {output[3]} ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &"

rule gzip:
    input:
        "02_clean/{nico}_1.paired.fq.gz",
        "02_clean/{nico}_2.paired.fq.gz"
    output:
        "02_clean/{nico}_1.paired.fq",
        "02_clean/{nico}_2.paired.fq"
    run:
        shell("gzip -d {input[0]} > {output[0]}")
        shell("gzip -d {input[1]} > {output[1]}")

rule map:
    input:
        "02_clean/{nico}_1.paired.fq",
        "02_clean/{nico}_2.paired.fq"
    output:
        "03_align/{nico}.sam"
    log:
        "logs/map/{nico}.log"
    threads: 40
    shell:
        "hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 {input[0]} -2 {input[1]} -S {output} >{log} 2>&1 &"

rule sort2bam:
    input:
        "03_align/{nico}.sam"
    output:
        "03_align/{nico}.bam"
    threads:30
    shell:
        "samtools sort -@ 20 -m 20G -o {output} {input} "

rule count:
    input:
        "03_align/{nico}.bam"
    output:
        "04_exp/{nico}_count.txt"
    shell:
        "featureCounts -T 10 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o {output} {input}"

rule fpkm:
    input:
        "03_align/{nico}.bam"
    output:
        "05_ft/{nico}_gene.gtf"
        "05_ft/{nico}_transcript.gtf"
    shell:
        "stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -A {output[0]} -o {output[1]} {input}"

Here is my directory structure:

|-- 03_align
|   |-- epcr1.bam
|   |-- epcr1.sam
|   |-- epcr2.bam
|   |-- epcr2.sam
|   |-- wt1.bam
|   |-- wt1.sam
|   |-- wt2.bam
|   `-- wt2.sam
|-- 04_exp

And the bam files are existed as I run the Snakefile before I add the 'rule fpkm' part.


Solution

  • Error is due to absence of comma between output files in rule fpkm. In the absence of comma, python sees it as multiline string, and therefore, concatenates them and sees it as one long string 05_ft/{nico}_gene.gtf05_ft/{nico}_transcript.gtf.

    rule fpkm:
        output:
            "05_ft/{nico}_gene.gtf",
            "05_ft/{nico}_transcript.gtf"