Search code examples
pythonmappingpipelinesnakemake

STAR index issues to the step 4


I am trying to index a reference genome on a pipeline with snakemake and I made this rule:

rule reference_faidx_star:
    input:
        "../resources/reference/Qrob_PM1N.fa"
    output:
        "../resources/reference/ref/"
    threads: 1
    log: 
        "../results/logs/star/star_index.log"
    params:
        gtf= "../resources/reference/gff_gtf/Qrob_PM1N_genes_20161004.gtf"
    # resources:
    #     mem_mb=25000
    message:
        """
        INDEX STAR 
        """
    shell:
        "STAR --runMode genomeGenerate --runThreadN {threads} --genomeDir {output} --genomeFastaFiles {input} --sjdbGTFfile {params.gtf} --sjdbOverhang 149 --genomeSAindexNbases 12 " # Logging

At first everything works but there is a break at step 4. Only 4 files are created in my folder: chrLength.txt, chrNameLength.txt, chrName.txt, chrStart.txt, and the terminal displays this:


[Tue Apr 20 09:09:30 2021]
Job 4: 
        INDEX STAR 
        

Apr 20 09:09:30 ..... started STAR run
Apr 20 09:09:30 ... starting to generate Genome files
Apr 20 09:09:50 ... starting to sort Suffix Array. This may take a long time...
Apr 20 09:09:55 ... sorting Suffix Array chunks and saving them to disk...
/usr/bin/bash : ligne 1 :  7343 Processus arrêté      STAR --runMode genomeGenerate --runThreadN 1 --genomeDir ../resources/reference/ref/ --genomeFastaFiles ../resources/reference/Qrob_PM1N.fa --sjdbGTFfile ../resources/reference/gff_gtf/Qrob_PM1N_genes_20161004.gtf --sjdbOverhang 149 --genomeSAindexNbases 12 -limitGenomeGenerateRAM 25000000000
[Tue Apr 20 09:10:01 2021]
Error in rule reference_faidx_star:
    jobid: 4
    output: ../resources/reference/ref/
    log: ../results/logs/star/star_index.log (check log file(s) for error message)
    shell:
        STAR --runMode genomeGenerate --runThreadN 1 --genomeDir ../resources/reference/ref/ --genomeFastaFiles ../resources/reference/Qrob_PM1N.fa --sjdbGTFfile ../resources/reference/gff_gtf/Qrob_PM1N_genes_20161004.gtf --sjdbOverhang 149 --genomeSAindexNbases 12 
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) 

I don't understand what is wrong with this rule, where I don't write in bash?

I hope you can help me. Thank you, Have a nice day!


Solution

  • This problem has nothing to do with neither Snakemake, nor Python. The log clearly shows you the exact command that bash executes while Snakemake runs the pipeline:

    STAR --runMode genomeGenerate --runThreadN 1 --genomeDir ../resources/reference/ref/ --genomeFastaFiles ../resources/reference/Qrob_PM1N.fa --sjdbGTFfile ../resources/reference/gff_gtf/Qrob_PM1N_genes_20161004.gtf --sjdbOverhang 149 --genomeSAindexNbases 12 -limitGenomeGenerateRAM 25000000000
    

    Something went wrong during the execution, and that may be insufficient memory, disk problems, etc. Try to run this command in bash and check the return code: that may give you more information of what had happened.

    One useful Snakemake lifehack is to use --printshellcmds flag: this would explicitly show you all commands that Snakemake runs. You may repeat these commands manually, leaving all temporary files, and locate the problem.