I have issues with the following snakemake pipeline. I get en error after a dry run of the Snakefile:
Building DAG of jobs... WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line, or have a rule without wildcards at the very top of your workflow (e.g. the typical "rule all" which just collects all results you want to generate in the end)
I did many things in the rule all part, including listing manually the required files and thus avoiding wildcards. It did not help. I uninstalled the old version and installed the latest one of snakemake. It did not produce any effect.
Here is the snakefile. Any help will be highly appreciated:
import os
import pandas as pd
# Read beta and b combinations from CSV
beta_b_values = []
with open("beta_b_combinations.csv", "r") as f:
next(f) # Skip header
for line in f:
beta, b = line.strip().split(",")
safe_beta = beta.replace(".", "_")
safe_b = b.replace(".", "_")
beta_b_values.append((safe_beta, safe_b))
# Print beta_b_values after defining it for debugging
print("Loaded beta_b_values:", beta_b_values)
# Define paths dynamically
def get_folder(beta, b):
return f"beta_{beta}_b_{b}"
def get_data_folder(beta, b):
return f"{get_folder(beta, b)}/data_1_first500"
# Step 1: Run C++ Simulations via Bash Script
rule run_simulations:
output:
"{folder}/data_1_first500/replica_{i}.csv"
params:
executable="metropolis_extended"
shell:
"""
set -e # Stop script on any error
bash run_metropolis_extended.sh {params.executable} {wildcards.folder} {wildcards.i} {output}
"""
# Step 2: Merge CSV Files After Simulations
rule merge_replicas:
input:
"simulations_done.flag",
expand("{folder}/data_1_first500/replica_{i}.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values], i=range(1, 501))
output:
expand("{folder}/merged_replicas.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
shell:
"""
for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
python merge_files.py --folder "$folder/data_1_first500" --output "$folder/merged_replicas.csv"
done
"""
# Step 3: Compute Means After Merging
rule compute_means:
input:
expand("{folder}/merged_replicas.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
output:
expand("{folder}/merged_replicas_with_means.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
shell:
"""
for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
python merged_replicas_with_means.py --input "$folder/merged_replicas.csv" --output "$folder/merged_replicas_with_means.csv"
done
"""
# Step 4: Generate Plots
rule generate_plots:
input:
expand("{folder}/merged_replicas_with_means.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
output:
expand("{folder}/plots_done.flag", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
shell:
"""
for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
Rscript plot_results.R --output "$folder"
touch "$folder/plots_done.flag"
done
"""
# Step 5: Compute Thermalized Averages
rule compute_thermalized_averages:
input:
expand("{folder}/merged_replicas_with_means.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
output:
expand("{folder}/thermalized_averages.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values]),
expand("{folder}/thermalized_averages_done.flag", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
shell:
"""
for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
Rscript thermalized_quantities.R --input "$folder/merged_replicas_with_means.csv" --output "$folder/thermalized_averages.csv"
touch "$folder/thermalized_averages_done.flag"
done
"""
# Step 6: Compute Errors via Jupyter Notebook
rule compute_errors:
input:
expand("{folder}/thermalized_averages.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
output:
["errors_computed.flag", "beta_b_and_means_with_errors.csv"]
shell:
"""
papermill computing_errors.ipynb computing_errors_output.ipynb
touch errors_computed.flag
"""
# Step 7: Collect all results into a Single CSV file and generate Final Plots
rule generate_final_plots:
input:
"beta_b_and_means_with_errors.csv"
output:
"errors_all_beta_b_combinations.csv",
"magnetization_plot.png",
"hamiltonian_plot.png"
shell:
"""
python collect_and_plot_errors.py
"""
# Precompute the list of plots_done.flag files
plots_done_files = [f"{get_folder(beta, b)}/plots_done.flag" for beta, b in beta_b_values]
# Final Rule: Defines Overall Workflow Goal
rule all:
input:
# Explicitly list all plots_done.flag files
plots_done_files,
"errors_all_beta_b_combinations.csv",
"magnetization_plot.png",
"hamiltonian_plot.png"
By default, Snakemake sees the first rule as its target. Thus, if you just run something like snakemake -n
, it tries to solve for rule run_simulations
. If you move rule all
above rule run_simulations
, it should work