Search code examples
pipelinesnakemakerna-seq

Is Snakemake run step by step or parallel?


I have been writting the RNA-seq pipeline by using Snakemake for a week. I still dno't know the working order.The version of snakemake is 5.4.4

My RNA-seq pipeline is composed of five part,so I write five rule(rule trim, rule alignemnt, rule sort_to_bam, rule fpkm, rule count).when I write a rule, I will test it by running it。And finally I finished it.And it running fine when I test every rule step by step.Here is my Snakefile:

SBT=["wt1","wt2","epcr1","epcr2"]

ruleorder: trim > map > sort2bam > fpkm > count
rule all:
    input:
        expand("02_clean/{nico}_1.paired.fq.gz", nico=SBT),
        expand("02_clean/{nico}_2.paired.fq.gz", nico=SBT),
        expand("03_align/{nico}.sam", nico=SBT),
        expand("04_exp/{nico}_count.txt", nico=SBT),
        expand("05_ft/{nico}_gene.gtf", nico=SBT),
        expand("05_ft/{nico}_transcript.gtf", nico=SBT)

rule trim:
    input:
        "01_raw/{nico}_1.fastq",
        "01_raw/{nico}_2.fastq"
    output:
        "02_clean/{nico}_1.paired.fq.gz",
        "02_clean/{nico}_1.unpaired.fq.gz",
        "02_clean/{nico}_2.paired.fq.gz",
        "02_clean/{nico}_2.unpaired.fq.gz",
    threads: 20
    shell:
        "java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 {input[0]} {input[1]} {output[0]} {output[1]} {output[2]} {output[3]} ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &"

rule map:
    input:
        "02_clean/{nico}_1.paired.fq.gz",
        "02_clean/{nico}_2.paired.fq.gz"
    output:
        "03_align/{nico}.sam"
    log:
        "logs/map/{nico}.log"
    threads: 40
    shell:
        "hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 {input[0]} -2 {input[1]} -S {output} >{log} 2>&1 &"

rule sort2bam:
    input:
        "03_align/{nico}.sam"
    output:
        "03_align/{nico}.bam"
    threads:20
    shell:
        "samtools sort -@ 20 -m 10G -o {output} {input}"

rule count:
    input:
        "03_align/{nico}.bam"
    output:
        "04_exp/{nico}_count.txt"
    shell:
        "featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o {output} {input}"


rule fpkm:
    input:
        "03_align/{nico}.bam"
    output:
        "05_ft/{nico}_gene.gtf",
        "05_ft/{nico}_transcript.gtf"
    threads: 40
    shell:
        "stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A {output[0]} -o {output[1]} {input}"

raw_data is showing as follows:

(py3) root@SBT:~/s/r/snakemake/my_rnaseq_data/01_raw# tree
.
|-- epcr1_1.fastq
|-- epcr1_2.fastq
|-- epcr2_1.fastq
|-- epcr2_2.fastq
|-- wt1_1.fastq
|-- wt1_2.fastq
|-- wt2_1.fastq
`-- wt2_2.fastq

Then I want to test the pipeline from the raw_data,deleting all existed intermediate file which I previous test step by step.Here is my dry_run results:

Building DAG of jobs...
Job counts:
        count   jobs
        1       all
        4       count
        4       fpkm
        4       map
        4       sort2bam
        4       trim
        21

[Tue Apr 30 03:09:28 2019]
rule trim:
    input: 01_raw/epcr1_1.fastq, 01_raw/epcr1_2.fastq
    output: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_1.unpaired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr1_2.unpaired.fq.gz
    jobid: 3
    wildcards: nico=epcr1

java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/epcr1_1.fastq 01_raw/epcr1_2.fastq 02_clean/epcr1_1.paired.fq.gz 02_clean/epcr1_1.unpaired.fq.gz 02_clean/epcr1_2.paired.fq.gz 02_clean/epcr1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &

[Tue Apr 30 03:09:28 2019]
rule trim:
    input: 01_raw/epcr2_1.fastq, 01_raw/epcr2_2.fastq
    output: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_1.unpaired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 02_clean/epcr2_2.unpaired.fq.gz
    jobid: 4
    wildcards: nico=epcr2

java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/epcr2_1.fastq 01_raw/epcr2_2.fastq 02_clean/epcr2_1.paired.fq.gz 02_clean/epcr2_1.unpaired.fq.gz 02_clean/epcr2_2.paired.fq.gz 02_clean/epcr2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &

[Tue Apr 30 03:09:28 2019]
rule trim:
    input: 01_raw/wt1_1.fastq, 01_raw/wt1_2.fastq
    output: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_1.unpaired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt1_2.unpaired.fq.gz
    jobid: 1
    wildcards: nico=wt1

java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/wt1_1.fastq 01_raw/wt1_2.fastq 02_clean/wt1_1.paired.fq.gz 02_clean/wt1_1.unpaired.fq.gz 02_clean/wt1_2.paired.fq.gz 02_clean/wt1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &

[Tue Apr 30 03:09:28 2019]
rule trim:
    input: 01_raw/wt2_1.fastq, 01_raw/wt2_2.fastq
    output: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_1.unpaired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/wt2_2.unpaired.fq.gz
    jobid: 2
    wildcards: nico=wt2

java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/wt2_1.fastq 01_raw/wt2_2.fastq 02_clean/wt2_1.paired.fq.gz 02_clean/wt2_1.unpaired.fq.gz 02_clean/wt2_2.paired.fq.gz 02_clean/wt2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &

[Tue Apr 30 03:09:28 2019]
rule map:
    input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
    output: 03_align/wt1.sam
    log: logs/map/wt1.log
    jobid: 5
    wildcards: nico=wt1

hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/wt1_1.paired.fq.gz -2 02_clean/wt1_2.paired.fq.gz -S 03_align/wt1.sam >logs/map/wt1.log 2>&1 &

[Tue Apr 30 03:09:28 2019]
rule map:
    input: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz
    output: 03_align/epcr2.sam
    log: logs/map/epcr2.log
    jobid: 8
    wildcards: nico=epcr2

hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/epcr2_1.paired.fq.gz -2 02_clean/epcr2_2.paired.fq.gz -S 03_align/epcr2.sam >logs/map/epcr2.log 2>&1 &

[Tue Apr 30 03:09:28 2019]
rule map:
    input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
    output: 03_align/wt2.sam
    log: logs/map/wt2.log
    jobid: 6
    wildcards: nico=wt2

hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/wt2_1.paired.fq.gz -2 02_clean/wt2_2.paired.fq.gz -S 03_align/wt2.sam >logs/map/wt2.log 2>&1 &

[Tue Apr 30 03:09:28 2019]
rule map:
    input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
    output: 03_align/epcr1.sam
    log: logs/map/epcr1.log
    jobid: 7
    wildcards: nico=epcr1

hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/epcr1_1.paired.fq.gz -2 02_clean/epcr1_2.paired.fq.gz -S 03_align/epcr1.sam >logs/map/epcr1.log 2>&1 &

[Tue Apr 30 03:09:28 2019]
rule sort2bam:
    input: 03_align/epcr1.sam
    output: 03_align/epcr1.bam
    jobid: 19
    wildcards: nico=epcr1

samtools sort -@ 20 -m 10G -o 03_align/epcr1.bam 03_align/epcr1.sam

[Tue Apr 30 03:09:28 2019]
rule sort2bam:
    input: 03_align/epcr2.sam
    output: 03_align/epcr2.bam
    jobid: 20
    wildcards: nico=epcr2

samtools sort -@ 20 -m 10G -o 03_align/epcr2.bam 03_align/epcr2.sam

[Tue Apr 30 03:09:28 2019]
rule sort2bam:
    input: 03_align/wt1.sam
    output: 03_align/wt1.bam
    jobid: 17
    wildcards: nico=wt1

samtools sort -@ 20 -m 10G -o 03_align/wt1.bam 03_align/wt1.sam

[Tue Apr 30 03:09:28 2019]
rule sort2bam:
    input: 03_align/wt2.sam
    output: 03_align/wt2.bam
    jobid: 18
    wildcards: nico=wt2

samtools sort -@ 20 -m 10G -o 03_align/wt2.bam 03_align/wt2.sam

[Tue Apr 30 03:09:28 2019]
rule count:
    input: 03_align/wt2.bam
    output: 04_exp/wt2_count.txt
    jobid: 10
    wildcards: nico=wt2

featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/wt2_count.txt 03_align/wt2.bam

[Tue Apr 30 03:09:28 2019]
rule count:
    input: 03_align/wt1.bam
    output: 04_exp/wt1_count.txt
    jobid: 9
    wildcards: nico=wt1

featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/wt1_count.txt 03_align/wt1.bam

[Tue Apr 30 03:09:28 2019]
rule count:
    input: 03_align/epcr2.bam
    output: 04_exp/epcr2_count.txt
    jobid: 12
    wildcards: nico=epcr2

featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/epcr2_count.txt 03_align/epcr2.bam

[Tue Apr 30 03:09:28 2019]
rule fpkm:
    input: 03_align/wt1.bam
    output: 05_ft/wt1_gene.gtf, 05_ft/wt1_transcript.gtf
    jobid: 13
    wildcards: nico=wt1

stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/wt1_gene.gtf -o 05_ft/wt1_transcript.gtf 03_align/wt1.bam

[Tue Apr 30 03:09:28 2019]
rule fpkm:
    input: 03_align/epcr2.bam
    output: 05_ft/epcr2_gene.gtf, 05_ft/epcr2_transcript.gtf
    jobid: 16
    wildcards: nico=epcr2

stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/epcr2_gene.gtf -o 05_ft/epcr2_transcript.gtf 03_align/epcr2.bam

[Tue Apr 30 03:09:28 2019]
rule fpkm:
    input: 03_align/wt2.bam
    output: 05_ft/wt2_gene.gtf, 05_ft/wt2_transcript.gtf
    jobid: 14
    wildcards: nico=wt2

stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/wt2_gene.gtf -o 05_ft/wt2_transcript.gtf 03_align/wt2.bam

[Tue Apr 30 03:09:28 2019]
rule count:
    input: 03_align/epcr1.bam
    output: 04_exp/epcr1_count.txt
    jobid: 11
    wildcards: nico=epcr1

featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/epcr1_count.txt 03_align/epcr1.bam

[Tue Apr 30 03:09:28 2019]
rule fpkm:
    input: 03_align/epcr1.bam
    output: 05_ft/epcr1_gene.gtf, 05_ft/epcr1_transcript.gtf
    jobid: 15
    wildcards: nico=epcr1

stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/epcr1_gene.gtf -o 05_ft/epcr1_transcript.gtf 03_align/epcr1.bam

[Tue Apr 30 03:09:28 2019]
localrule all:
    input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt2_1.paired.fq.gz, 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr2_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 03_align/wt1.sam, 03_align/wt2.sam, 03_align/epcr1.sam, 03_align/epcr2.sam, 04_exp/wt1_count.txt, 04_exp/wt2_count.txt, 04_exp/epcr1_count.txt, 04_exp/epcr2_count.txt, 05_ft/wt1_gene.gtf, 05_ft/wt2_gene.gtf, 05_ft/epcr1_gene.gtf, 05_ft/epcr2_gene.gtf, 05_ft/wt1_transcript.gtf, 05_ft/wt2_transcript.gtf, 05_ft/epcr1_transcript.gtf, 05_ft/epcr2_transcript.gtf
    jobid: 0

Job counts:
        count   jobs
        1       all
        4       count
        4       fpkm
        4       map
        4       sort2bam
        4       trim
        21
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

But when I really execute it ,it report with error when it running at rule sort2bam:

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        4       count
        4       fpkm
        4       map
        4       sort2bam
        4       trim
        21

[Tue Apr 30 03:11:57 2019]
rule trim:
    input: 01_raw/epcr1_1.fastq, 01_raw/epcr1_2.fastq
    output: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_1.unpaired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr1_2.unpaired.fq.gz
    jobid: 3
    wildcards: nico=epcr1

Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
 -threads 16 01_raw/epcr1_1.fastq 01_raw/epcr1_2.fastq 02_clean/epcr1_1.paired.fq.gz 02_clean/epcr1_1.unpaired.fq.gz 02_clean/epcr1_2.paired.fq.gz 02_clean/epcr1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:11:59 2019]
Finished job 3.
1 of 21 steps (5%) done

[Tue Apr 30 03:11:59 2019]
rule trim:
    input: 01_raw/wt1_1.fastq, 01_raw/wt1_2.fastq
    output: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_1.unpaired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt1_2.unpaired.fq.gz
    jobid: 1
    wildcards: nico=wt1

Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
 -threads 16 01_raw/wt1_1.fastq 01_raw/wt1_2.fastq 02_clean/wt1_1.paired.fq.gz 02_clean/wt1_1.unpaired.fq.gz 02_clean/wt1_2.paired.fq.gz 02_clean/wt1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:00 2019]
Finished job 1.
2 of 21 steps (10%) done

[Tue Apr 30 03:12:00 2019]
rule trim:
    input: 01_raw/wt2_1.fastq, 01_raw/wt2_2.fastq
    output: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_1.unpaired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/wt2_2.unpaired.fq.gz
    jobid: 2
    wildcards: nico=wt2

Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
 -threads 16 01_raw/wt2_1.fastq 01_raw/wt2_2.fastq 02_clean/wt2_1.paired.fq.gz 02_clean/wt2_1.unpaired.fq.gz 02_clean/wt2_2.paired.fq.gz 02_clean/wt2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:03 2019]
Finished job 2.
3 of 21 steps (14%) done

[Tue Apr 30 03:12:03 2019]
rule trim:
    input: 01_raw/epcr2_1.fastq, 01_raw/epcr2_2.fastq
    output: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_1.unpaired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 02_clean/epcr2_2.unpaired.fq.gz
    jobid: 4
    wildcards: nico=epcr2

Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
 -threads 16 01_raw/epcr2_1.fastq 01_raw/epcr2_2.fastq 02_clean/epcr2_1.paired.fq.gz 02_clean/epcr2_1.unpaired.fq.gz 02_clean/epcr2_2.paired.fq.gz 02_clean/epcr2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:04 2019]
Finished job 4.
4 of 21 steps (19%) done

[Tue Apr 30 03:12:04 2019]
rule map:
    input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
    output: 03_align/wt2.sam
    log: logs/map/wt2.log
    jobid: 6
    wildcards: nico=wt2

Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:06 2019]
Finished job 6.
5 of 21 steps (24%) done

[Tue Apr 30 03:12:06 2019]
rule map:
    input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
    output: 03_align/epcr1.sam
    log: logs/map/epcr1.log
    jobid: 7
    wildcards: nico=epcr1

Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:07 2019]
Finished job 7.
6 of 21 steps (29%) done

[Tue Apr 30 03:12:07 2019]
rule map:
    input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
    output: 03_align/wt1.sam
    log: logs/map/wt1.log
    jobid: 5
    wildcards: nico=wt1

Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:09 2019]
Finished job 5.
7 of 21 steps (33%) done

[Tue Apr 30 03:12:09 2019]
rule sort2bam:
    input: 03_align/wt1.sam
    output: 03_align/wt1.bam
    jobid: 17
    wildcards: nico=wt1

[W::sam_read1] Parse error at line 64601
samtools sort: truncated file. Aborting
[Tue Apr 30 03:12:09 2019]
Error in rule sort2bam:
    jobid: 17
    output: 03_align/wt1.bam

RuleException:
CalledProcessError in line 45 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Command 'set -euo pipefail;  samtools sort -@ 20 -m 10G -o 03_align/wt1.bam 03_align/wt1.sam' returned non-zero exit status 1.
  File "/root/s/r/snakemake/my_rnaseq_data/Snakefile", line 45, in __rule_sort2bam
  File "/root/miniconda2/envs/py3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /root/s/r/snakemake/my_rnaseq_data/.snakemake/log/2019-04-30T031153.994462.snakemake.log

After that,I stop all running task and I checked the folder,find it has newly generated several files showing as follows:

|-- 02_clean
|   |-- epcr1_1.paired.fq.gz
|   |-- epcr1_1.unpaired.fq.gz
|   |-- epcr1_2.paired.fq.gz
|   |-- epcr1_2.unpaired.fq.gz
|   |-- epcr2_1.paired.fq.gz
|   |-- epcr2_1.unpaired.fq.gz
|   |-- epcr2_2.paired.fq.gz
|   |-- epcr2_2.unpaired.fq.gz
|   |-- wt1_1.paired.fq.gz
|   |-- wt1_1.unpaired.fq.gz
|   |-- wt1_2.paired.fq.gz
|   |-- wt1_2.unpaired.fq.gz
|   |-- wt2_1.paired.fq.gz
|   |-- wt2_1.unpaired.fq.gz
|   |-- wt2_2.paired.fq.gz
|   `-- wt2_2.unpaired.fq.gz
|-- 03_align
|   |-- epcr1.sam
|   |-- wt1.sam
|   `-- wt2.sam

but these files are incomplete. So It make me confused about the order of execution.Is it running parallel of five rule for each sample? or just running all samples rule by rule, The running process of my pipeline seems support the former one. And this also explain the error: "samtools sort: truncated file. Aborting" at sam2bam stage. I don't know whether my guess is right.

But I have add the ruleorder in my Snakefile:

ruleorder: trim > map > sort2bam > fpkm > count

But it seem not working!Is there any other options or setting can control the rule execution order?

And last night I run the snakefile start from the "rule map" based on the trimmed fastq.gz which have been trimmed with same Snakefile. And it running well! the whole running process is showing as follows:

lding DAG of jobs...
viUsing shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        4       count
        4       fpkm
        4       map
        4       sort2bam
        17

[Mon Apr 29 14:37:39 2019]
rule map:
    input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
    output: 03_align/epcr1.sam
    log: logs/map/epcr1.log
    jobid: 19
    wildcards: nico=epcr1

Waiting at most 5 seconds for missing files.
[Mon Apr 29 14:37:40 2019]
Finished job 19.
1 of 17 steps (6%) done

[Mon Apr 29 14:37:40 2019]
rule map:
    input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
    output: 03_align/wt1.sam
    log: logs/map/wt1.log
    jobid: 17
    wildcards: nico=wt1

Waiting at most 5 seconds for missing files.
[Mon Apr 29 14:37:48 2019]
Finished job 17.
2 of 17 steps (12%) done

[Mon Apr 29 14:37:51 2019]
rule map:
    input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
    output: 03_align/wt2.sam
    log: logs/map/wt2.log
    jobid: 18
    wildcards: nico=wt2

[Mon Apr 29 14:37:55 2019]
Finished job 18.
3 of 17 steps (18%) done

[Mon Apr 29 14:37:57 2019]
rule map:
    input: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz
    output: 03_align/epcr2.sam
    log: logs/map/epcr2.log
    jobid: 20
    wildcards: nico=epcr2

[Mon Apr 29 14:38:02 2019]
Finished job 20.
4 of 17 steps (24%) done

[Mon Apr 29 14:38:04 2019]
rule sort2bam:
    input: 03_align/wt1.sam
    output: 03_align/wt1.bam
    jobid: 13
    wildcards: nico=wt1

[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 14:39:45 2019]
Finished job 13.
5 of 17 steps (29%) done

[Mon Apr 29 14:39:46 2019]
rule fpkm:
    input: 03_align/wt1.bam

output: 05_ft/wt1_gene.gtf, 05_ft/wt1_transcript.gtf
    jobid: 9
    wildcards: nico=wt1

[Mon Apr 29 14:40:42 2019]
Finished job 9.
6 of 17 steps (35%) done

[Mon Apr 29 14:40:42 2019]
rule count:
    input: 03_align/wt1.bam
    output: 04_exp/wt1_count.txt
    jobid: 5
    wildcards: nico=wt1
[Mon Apr 29 14:41:40 2019]
Finished job 5.
7 of 17 steps (41%) done

[Mon Apr 29 14:41:40 2019]
rule sort2bam:
    input: 03_align/epcr2.sam
    output: 03_align/epcr2.bam
    jobid: 16
    wildcards: nico=epcr2
[Mon Apr 29 14:56:41 2019]
Finished job 16.
8 of 17 steps (47%) done

[Mon Apr 29 14:56:41 2019]
rule count:
    input: 03_align/epcr2.bam
    output: 04_exp/epcr2_count.txt
    jobid: 8
    wildcards: nico=epcr2

[Mon Apr 29 14:57:45 2019]
Finished job 8.
9 of 17 steps (53%) done

[Mon Apr 29 14:57:45 2019]
rule fpkm:
    input: 03_align/epcr2.bam
    output: 05_ft/epcr2_gene.gtf, 05_ft/epcr2_transcript.gtf
    jobid: 12
    wildcards: nico=epcr2

h[Mon Apr 29 15:01:32 2019]
Finished job 12.
10 of 17 steps (59%) done

[Mon Apr 29 15:01:32 2019]
rule sort2bam:
    input: 03_align/wt2.sam
    output: 03_align/wt2.bam
    jobid: 14
    wildcards: nico=wt2

[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 15:08:05 2019]
Finished job 14.
11 of 17 steps (65%) done

[Mon Apr 29 15:08:05 2019]
rule fpkm:
    input: 03_align/wt2.bam
    output: 05_ft/wt2_gene.gtf, 05_ft/wt2_transcript.gtf
    jobid: 10
    wildcards: nico=wt2

[Mon Apr 29 15:12:28 2019]
Finished job 10.
12 of 17 steps (71%) done

[Mon Apr 29 15:12:28 2019]
rule count:
    input: 03_align/wt2.bam
    output: 04_exp/wt2_count.txt
    jobid: 6
    wildcards: nico=wt2
[Mon Apr 29 15:13:18 2019]
Finished job 6.
13 of 17 steps (76%) done

[Mon Apr 29 15:13:18 2019]
rule sort2bam:
    input: 03_align/epcr1.sam
    output: 03_align/epcr1.bam
    jobid: 15
    wildcards: nico=epcr1

[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 15:21:35 2019]
Finished job 15.
14 of 17 steps (82%) done

[Mon Apr 29 15:21:35 2019]
rule fpkm:
    input: 03_align/epcr1.bam
    output: 05_ft/epcr1_gene.gtf, 05_ft/epcr1_transcript.gtf
    jobid: 11
    wildcards: nico=epcr1

[Mon Apr 29 15:27:22 2019]
Finished job 11.
15 of 17 steps (88%) done

[Mon Apr 29 15:27:22 2019]
rule count:
    input: 03_align/epcr1.bam
    output: 04_exp/epcr1_count.txt
    jobid: 7
    wildcards: nico=epcr1

[Mon Apr 29 15:28:32 2019]
Finished job 7.
16 of 17 steps (94%) done

[Mon Apr 29 15:28:32 2019]
localrule all:
    input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt2_1.paired.fq.gz, 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr2_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 04_exp/wt1_count.txt, 04_exp/wt2_count.txt, 04_exp/epcr1_count.txt, 04_exp/epcr2_count.txt, 05_ft/wt1_gene.gtf, 05_ft/wt2_gene.gtf, 05_ft/epcr1_gene.gtf, 05_ft/epcr2_gene.gtf, 05_ft/wt1_transcript.gtf, 05_ft/wt2_transcript.gtf, 05_ft/epcr1_transcript.gtf, 05_ft/epcr2_transcript.gtf
    jobid: 0

[Mon Apr 29 15:28:32 2019]
Finished job 0.
17 of 17 steps (100%) done

And the execution order of rule is like that: At first, rule map is fully executed.Then execute the left rule for each sample.bam.

Why it's order differ from the whole pipeline start from raw_data?

Summary: two quesions: 1. Is my Snakefile or order of exectuion that make the error ? 2. How to edit my Snakefile to set the order of every rule to execute task rule by rule?

I will apreciate if somebody help!


Solution

  • The problem is that your shell commands in the trim and map rules end with & which sends the working processes to the background and the main process moves on and exits. This signals to Snakemake that the job has finished executing. Snakemake then proceeds to check for the output files that were declared to result from the job, but it can't find them (hence, the errors you see).

    It looks like everything would work if you removed the &s and kept the processes in the foreground.