Using wildcards in slurm resources directive with snakemake

I'm using snakemake to create rules and submit jobs on our HPC with slurm. To make the output "prettier", I would like to be able to set the job_name argument in the resources directive so that the wildcards being used are integrated into the job name.

For example...

datasets = ["bioethanol", "human", "lake"]

rule clean_data:
  input: 
    script="code/exmample.sh",
    data="data/{dataset}/input.txt"
  output:
    "data/{dataset}/output.txt",
  resources:
    job_name="{dataset}_clean_data"  
    cpus=8,
    mem_mb=45000,
    time_min=3000
  shell:
    """
    {input.script} {input.data}
    """

I have a config.yaml file that looks like this...

# cluster commands
cluster: "sbatch --job-name={resources.job_name}
          --account=my_account 
          --partition=standard 
          --nodes=1 
          --time={resources.time_min} 
          --mem={resources.mem_mb}
          -c {resources.cpus} 
          -o logs_slurm/%x_%j.out"

When I do this the three jobs that are created are all called {dataset}_clean_data without inserting the actual dataset names in place of {dataset}. Is there a way to get the job names to instead be called bioethanol_clean_data, human_clean_data, lake_clean_data?

Solution

In your resource directive, you need to use an input function:

  resources:
    job_name=lambda wildcards: f"{wildcards.dataset}_clean_data"

You may run into some issues with having to remember to include that in every resource. Some other options include:

Make slurm outputs go to the snakemake log file, which includes wildcards.
Change the name of each rule to include wildcards (have not tested)
Use wildcards in the cluster submission script --job-name={rule}_{wildcards}

I would lean towards using logs, but do checkout using wildcards in the job name.