Snakemake waiting to finish all parallel jobs before starting next parallel job

I have written Snakemake rule which runs Muscle (MSA-tool) to calculate multiple sequence alignment (MSA) for all files in a directory. The task is trivially parallel, as different files do not depend on each other. The problem is, that Snakemake runs this rule in n-number of "batches", where n is cores given to Snakemake as an argument:

snakemake -j 4 msa.

Snakemake starts with running 4 jobs in parallel and it waits until each one of them is finished before starting a new "batch" of 4 jobs. This wastes CPU time, as the input files vary a lot in size and their MSA calculation time can vary from seconds to minutes. Resulting in following execution flow:

job1|-----           |job5|-----     |...|->
job2|---             |job6|--------  |...|->
job3|----------------|job7|--        |...|->
job4|-               |job8|----------|...|->

How could I tell Snakemake to truly parallelize the jobs?

CLUSTER_IDS, = glob_wildcards(os.path.join(WORK_DIR, "fasta", "{id}.fasta"))
rule msa:
    input:
       expand(os.path.join(WORK_DIR, "msa", "{id}.afa"), id=CLUSTER_IDS)

rule:
    input:
        os.path.join(WORK_DIR, "fasta", "{id}.fasta")
    output:
        os.path.join(WORK_DIR, "msa", "{id}.afa")
    shell:
        "{MUSCLE_PATH}/muscle3.8.31_i86darwin64 -in {input} -out {output}"

Solution

The issue was solved after updating Snakemake to version 6.5.2 from 5.30.1.