Search code examples
snakemake

Snakemake-Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line


I have issues with the following snakemake pipeline. I get en error after a dry run of the Snakefile:

Building DAG of jobs... WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line, or have a rule without wildcards at the very top of your workflow (e.g. the typical "rule all" which just collects all results you want to generate in the end)

I did many things in the rule all part, including listing manually the required files and thus avoiding wildcards. It did not help. I uninstalled the old version and installed the latest one of snakemake. It did not produce any effect.

Here is the snakefile. Any help will be highly appreciated:

import os
import pandas as pd


# Read beta and b combinations from CSV
beta_b_values = []
with open("beta_b_combinations.csv", "r") as f:
   next(f)  # Skip header
    for line in f:
        beta, b = line.strip().split(",")
        safe_beta = beta.replace(".", "_")
        safe_b = b.replace(".", "_")
        beta_b_values.append((safe_beta, safe_b))
    
# Print beta_b_values after defining it for debugging
print("Loaded beta_b_values:", beta_b_values)        

# Define paths dynamically
def get_folder(beta, b):
    return f"beta_{beta}_b_{b}"

def get_data_folder(beta, b):
    return f"{get_folder(beta, b)}/data_1_first500"

# Step 1: Run C++ Simulations via Bash Script
rule run_simulations:
    output:
        "{folder}/data_1_first500/replica_{i}.csv"
    params:
        executable="metropolis_extended"
    shell:
        """
        set -e  # Stop script on any error
        bash run_metropolis_extended.sh {params.executable} {wildcards.folder}  {wildcards.i} {output}
        """

 # Step 2: Merge CSV Files After Simulations
rule merge_replicas:
    input:
        "simulations_done.flag",
        expand("{folder}/data_1_first500/replica_{i}.csv", folder=[get_folder(b[0],  b[1]) for b in beta_b_values], i=range(1, 501))
    output:
        expand("{folder}/merged_replicas.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
    shell:
        """
        for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
            python merge_files.py --folder "$folder/data_1_first500" --output "$folder/merged_replicas.csv"
        done
        """


# Step 3: Compute Means After Merging
rule compute_means:
    input:
        expand("{folder}/merged_replicas.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
    output:
        expand("{folder}/merged_replicas_with_means.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
    shell:
        """
        for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
            python merged_replicas_with_means.py --input "$folder/merged_replicas.csv" --output "$folder/merged_replicas_with_means.csv"
        done
        """

# Step 4: Generate Plots
rule generate_plots:
    input:
        expand("{folder}/merged_replicas_with_means.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
    output:
        expand("{folder}/plots_done.flag", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
    shell:
        """
        for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
            Rscript plot_results.R --output "$folder"
            touch "$folder/plots_done.flag"
        done
        """

# Step 5: Compute Thermalized Averages
rule compute_thermalized_averages:
    input:
        expand("{folder}/merged_replicas_with_means.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
    output:
        expand("{folder}/thermalized_averages.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values]),
        expand("{folder}/thermalized_averages_done.flag", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
    shell:
        """
        for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
            Rscript thermalized_quantities.R --input "$folder/merged_replicas_with_means.csv" --output "$folder/thermalized_averages.csv"
            touch "$folder/thermalized_averages_done.flag"
        done
        """

# Step 6: Compute Errors via Jupyter Notebook
rule compute_errors:
    input:
        expand("{folder}/thermalized_averages.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
    output:
        ["errors_computed.flag", "beta_b_and_means_with_errors.csv"]
    shell:
        """
        papermill computing_errors.ipynb computing_errors_output.ipynb
        touch errors_computed.flag
        """

# Step 7: Collect all results into a Single CSV file and generate Final Plots
rule generate_final_plots:
    input:
        "beta_b_and_means_with_errors.csv"
    output:
        "errors_all_beta_b_combinations.csv",
        "magnetization_plot.png",
        "hamiltonian_plot.png"
    shell:
        """
        python collect_and_plot_errors.py
        """


# Precompute the list of plots_done.flag files
plots_done_files = [f"{get_folder(beta, b)}/plots_done.flag" for beta, b in     beta_b_values]  

# Final Rule: Defines Overall Workflow Goal
rule all:
    input:
        # Explicitly list all plots_done.flag files
        plots_done_files,
        "errors_all_beta_b_combinations.csv",
        "magnetization_plot.png",
        "hamiltonian_plot.png"

Solution

  • By default, Snakemake sees the first rule as its target. Thus, if you just run something like snakemake -n, it tries to solve for rule run_simulations. If you move rule all above rule run_simulations, it should work