I am trying to index a reference genome on a pipeline with snakemake and I made this rule:
rule reference_faidx_star:
input:
"../resources/reference/Qrob_PM1N.fa"
output:
"../resources/reference/ref/"
threads: 1
log:
"../results/logs/star/star_index.log"
params:
gtf= "../resources/reference/gff_gtf/Qrob_PM1N_genes_20161004.gtf"
# resources:
# mem_mb=25000
message:
"""
INDEX STAR
"""
shell:
"STAR --runMode genomeGenerate --runThreadN {threads} --genomeDir {output} --genomeFastaFiles {input} --sjdbGTFfile {params.gtf} --sjdbOverhang 149 --genomeSAindexNbases 12 " # Logging
At first everything works but there is a break at step 4. Only 4 files are created in my folder: chrLength.txt, chrNameLength.txt, chrName.txt, chrStart.txt, and the terminal displays this:
[Tue Apr 20 09:09:30 2021]
Job 4:
INDEX STAR
Apr 20 09:09:30 ..... started STAR run
Apr 20 09:09:30 ... starting to generate Genome files
Apr 20 09:09:50 ... starting to sort Suffix Array. This may take a long time...
Apr 20 09:09:55 ... sorting Suffix Array chunks and saving them to disk...
/usr/bin/bash : ligne 1 : 7343 Processus arrêté STAR --runMode genomeGenerate --runThreadN 1 --genomeDir ../resources/reference/ref/ --genomeFastaFiles ../resources/reference/Qrob_PM1N.fa --sjdbGTFfile ../resources/reference/gff_gtf/Qrob_PM1N_genes_20161004.gtf --sjdbOverhang 149 --genomeSAindexNbases 12 -limitGenomeGenerateRAM 25000000000
[Tue Apr 20 09:10:01 2021]
Error in rule reference_faidx_star:
jobid: 4
output: ../resources/reference/ref/
log: ../results/logs/star/star_index.log (check log file(s) for error message)
shell:
STAR --runMode genomeGenerate --runThreadN 1 --genomeDir ../resources/reference/ref/ --genomeFastaFiles ../resources/reference/Qrob_PM1N.fa --sjdbGTFfile ../resources/reference/gff_gtf/Qrob_PM1N_genes_20161004.gtf --sjdbOverhang 149 --genomeSAindexNbases 12
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
I don't understand what is wrong with this rule, where I don't write in bash?
I hope you can help me. Thank you, Have a nice day!
This problem has nothing to do with neither Snakemake, nor Python. The log clearly shows you the exact command that bash executes while Snakemake runs the pipeline:
STAR --runMode genomeGenerate --runThreadN 1 --genomeDir ../resources/reference/ref/ --genomeFastaFiles ../resources/reference/Qrob_PM1N.fa --sjdbGTFfile ../resources/reference/gff_gtf/Qrob_PM1N_genes_20161004.gtf --sjdbOverhang 149 --genomeSAindexNbases 12 -limitGenomeGenerateRAM 25000000000
Something went wrong during the execution, and that may be insufficient memory, disk problems, etc. Try to run this command in bash and check the return code: that may give you more information of what had happened.
One useful Snakemake lifehack is to use --printshellcmds
flag: this would explicitly show you all commands that Snakemake runs. You may repeat these commands manually, leaving all temporary files, and locate the problem.