Search code examples
pythonsnakemakelsf

Snakemake on cluster: OutputException and submit one job for each wildcard item


I try to use snakemake on LSF with LSF profile, but only one job is submitted when using a wildcard.

Submitted job 1 with external jobid '660343 logs/cluster/try_expand/unique/jobid1_4530cab3-d29c-485d-8d46-871fb7042e50.out'.

Below is a minimal example run with

snakemake --profile lsf -s try.smk 2> `date +"%Y%m%d_%H%M"`_snakemake_try.log --latency-wait 20
CHROMOSOMES = [ 20, 21, 22]

rule targets:
    input: 
         expand("try/chr{chromosome}.GATK_calls.indels.PASS.common_var_2.bcf", chromosome=CHROMOSOMES)
    log:
        "try_logs/targets.log"

rule try_expand:
    threads: 6
    output:
        expand("try/chr{chromosome}.GATK_calls.indels.PASS.common_var_2.bcf", chromosome=CHROMOSOMES) 
    shell:"""
        touch {output}
    """

The log file of the above command is here. I suspect this has been the reason for OutputException when running larger tasks that takes a long time to complete the first wildcard.

Waiting at most 20 seconds for missing files.
MissingOutputException in line 22 of extraction.smk:
Missing files after 20 seconds:
chr21.GATK_calls.indels.PASS.common_var.bcf
chr22.GATK_calls.indels.PASS.common_var.bcf

How can I avoid the OutputException and submit each wildcard item as a job? Thanks!


Solution

  • You're confusing a wildcard and a variable of the expand function. Your rule try_expand has the three files defined in output, hence it will only be run once to produce all your targets. In the output, {chromosome} is not a wildcard but a placeholder for the second argument of the expand function.

    What you probably want is:

    CHROMOSOMES = [ 20, 21, 22]
    
    rule targets:
        input: 
             expand("try/chr{chromosome}.GATK_calls.indels.PASS.common_var_2.bcf", chromosome=CHROMOSOMES)
        log:
            "try_logs/targets.log"
    
    rule try_expand:
        threads: 6
        output:
            "try/chr{chromosome}.GATK_calls.indels.PASS.common_var_2.bcf" 
        shell:
        """
            touch {output}
        """
    

    Note that if you need to use a wildcard in an expand function, you have to double the {}.
    example:

    output: expand("{path}/chr{{chromosome}}.GATK_calls.indels.PASS.common_var_2.bcf", path="/my/path")
    

    Here, {path} is a place holder defined in the second argument of the expand function, {{chromosome}} is a wildcard.