I have written a Snakefile "prepare_tuples.smk" having prepare_tuples
as my head rule.
The input is defined as the output of the rule hadd_tuples
.
When I run snakemake prepare_tuples -F -c20
I get the following error message:
Missing input files for rule prepare_tuples:
affected files:
/ceph/users/jmainusch/data/Bs2MuMu/2024_data_stripped.root
/ceph/users/jmainusch/simulation/Bs2MuMu/2024_data_stripped_truth-matched.root
/ceph/users/jmainusch/simulation/Bs2JpsiPhi/2024_data_stripped_truth-matched.root
/ceph/users/jmainusch/simulation/Bu2JpsiK/2024_data_stripped_truth-matched.root
/ceph/users/jmainusch/data/Bd2MuMu/2024_data_stripped.root
/ceph/users/jmainusch/data/Bs2JpsiPhi/2024_data_stripped.root
/ceph/users/jmainusch/data/Bs2KK/2024_data_stripped.root
/ceph/users/jmainusch/simulation/Bd2MuMu/2024_data_stripped_truth-matched.root
/ceph/users/jmainusch/simulation/Bd2KPi/2024_data_stripped_truth-matched.root
/ceph/users/jmainusch/data/Bd2KPi/2024_data_stripped.root
/ceph/users/jmainusch/simulation/Bd2PiPi/2024_data_stripped_truth-matched.root
/ceph/users/jmainusch/data/Bu2JpsiK/2024_data_stripped.root
/ceph/users/jmainusch/simulation/Bs2KK/2024_data_stripped_truth-matched.root
/ceph/users/jmainusch/data/Bd2PiPi/2024_data_stripped.root
/ceph/users/jmainusch/simulation/Bs2KPi/2024_data_stripped_truth-matched.root
/ceph/users/jmainusch/data/Bs2KPi/2024_data_stripped.root
On a side note: "prepare_tuples.smk" is part of a bigger Snakefile. I dont expect this as a problem, but mention it for completeness.
From my understanding, Snakemake should recognize that the desired file is produced in "hadd_tuples" and proceed with its production, but it does not.
configfile: "config/config.yaml"
path = config["repopath"]
inpath = config["cephpath"]
outpath = config["userpath"]
dataset = config["dataset"]
configfile: path + "/1_prepare_tuples/configs/tuples.yaml"
wildcard_constraints:
src = "data|simulation",
tuple = "^B.*"
rule prepare_tuples:
input:
data = expand(outpath + "data/{tuple}/2024_data_stripped.root", tuple = config["tuples"].keys()),
sim = expand(outpath + "simulation/{tuple}/2024_data_stripped_truth-matched.root", tuple = config["tuples"].keys()),
rule cut:
input:
script = path + "1_prepare_tuples/scripts/cut.py",
in_data = inpath + "{src}/{tuple}/2024/{magnet}/data.root",
cuts = path + "1_prepare_tuples/configs/cuts.yaml",
truth_vars = path + "1_prepare_tuples/configs/truth-match.yaml",
control_vars = path + "3_control_plots/configs/channels/{tuple}/plots.yaml",
BDTS_vars = path + "4_BDTS/configs/variables.yaml",
params:
eff_path = "results/efficiencies.yaml",
tuple_config = path + "/1_prepare_tuples/configs/tuples.yaml",
output:
out_data = outpath + "{src}/{tuple}/{magnet}/2024_data_stripped.root",
shell:
"""
python {input.script} \
--path {input.in_data} \
--channel {wildcards.tuple} \
--tuple_config {params.tuple_config} \
--source {wildcards.src} \
--magnet {wildcards.magnet} \
--outpath {output.out_data} \
--effpath {params.eff_path} \
--cuts {input.cuts} \
--truth_vars {input.truth_vars} \
--control_vars {input.control_vars} \
--BDTS_vars {input.BDTS_vars} \
"""
rule truthmatch:
input:
script = path + "1_prepare_tuples/scripts/truthmatch.py",
in_data = outpath + "simulation/{tuple}/{magnet}/2024_data_stripped.root",
cuts = path + "1_prepare_tuples/configs/truth-match.yaml",
cut_vars = path + "1_prepare_tuples/configs/cuts.yaml",
control_vars = path + "3_control_plots/configs/channels/{tuple}/plots.yaml",
BDTS_vars = path + "4_BDTS/configs/variables.yaml",
params:
eff_path = "results/efficiencies.yaml",
tuple_config = path + "/1_prepare_tuples/configs/tuples.yaml",
output:
out_data = outpath + "simulation/{tuple}/{magnet}/2024_data_stripped_truth-matched.root",
shell:
"""
python {input.script} \
--path {input.in_data} \
--channel {wildcards.tuple} \
--tuple_config {params.tuple_config} \
--magnet {wildcards.magnet} \
--outpath {output.out_data} \
--effpath {params.eff_path} \
--cuts {input.cuts} \
--cut_vars {input.cut_vars} \
--control_vars {input.control_vars} \
--BDTS_vars {input.BDTS_vars} \
"""
rule hadd_tuples:
input:
up = outpath + "{src}/{tuple}/MagUp/2024_data_{mod}.root",
down = outpath + "{src}/{tuple}/MagDown/2024_data_{mod}.root"
output:
outpath + "{src}/{tuple}/2024_data_{mod}.root",
shell:
"hadd {output} {input.up} {input.down}"
The issue here is that wildcards in Snakemake can match across the whole of the file path, including '/' separators and literal '.' chars, unless you constrain them otherwise. So I think you want:
wildcard_constraints:
src = "data|simulation",
tuple = "B[^/.]+",
mod = "[^/.]+",
(Using \w+
to specifically match a sequence of regular alphanumeric characters is often a good option, but it won't match hyphens only underscores.)
This should eliminate the "wildcard periodically repeated" error which is caused by the fact that your rule hadd_tuples
is (unintentionally) recursive. For example when Snakemake tries to make the file:
/ceph/users/jmainusch/data/Bs2MuMu/MagUp/2024_data_stripped.root
It currently matches that to:
/ceph/users/jmainusch/{src=data}/{tuple=Bs2MuMu/MagUp}/2024_data_{mod=stripped}.root
Obviously {tuple=Bs2MuMu/MagUp}
is nonsense but without a wildcard constraint Snakemake will make this substitution, and this in turn produces a nonsense input which in turn gets matched to the outputs of the same rule, and so on with the {tuple}
wildcard absorbing ever more junk until Snakemake gives up.
One other point... consider setting:
workdir: config["userpath"]
rather than adding outpath +
to all your inputs and outputs. For one thing, this makes it much easier to test a rule like hadd_tuples
in isolation, as well as making the code more legible.
Edited to add note:
From memory, I think using ^
and $
anchors in your wildcard constraints doesn't work because they only match at the beginning/end of the entire filename, not at the start and end of the wildcard. But don't quote me on that I've not tested it!