I am trying to create a symlink-directory structure for aliasing output directories in a Snakemake workflow.
Let's consider the following example:
A long time ago in a galaxy far, far away, somebody wanted to find the best ice cream flavour in the universe and conducted a survey. Our example workflow aims at representing the votes by a directory structure. The survey was conducted in English (because that's what they all speak in that foreign galaxy), but the results should be understood by non-English speakers as well. Symbolic links come to the rescue.
To make the input parsable for us humans as well as Snakemake, we stick them into a YAML file:
cat config.yaml
flavours:
chocolate:
- vader
- luke
- han
vanilla:
- yoda
- leia
berry:
- windu
translations:
french:
chocolat: chocolate
vanille: vanilla
baie: berry
german:
schokolade: chocolate
vanille: vanilla
beere: berry
To create the corresponding directory tree, I started with this simple Snakefile:
### Setup ###
configfile: "config.yaml"
### Targets ###
votes = ["english/" + flavour + "/" + voter
for flavour, voters in config["flavours"].items()
for voter in voters]
translations = {language + "_translation/" + translation
for language, translations in config["translations"].items()
for translation in translations.keys()}
### Commands ###
create_file_cmd = "touch '{output}'"
relative_symlink_cmd = "ln --symbolic --relative '{input}' '{output}'"
### Rules ###
rule all:
input: votes, translations
rule english:
output: "english/{flavour}/{voter}"
shell: create_file_cmd
rule translation:
input: lambda wc: "english/" + config["translations"][wc.lang][wc.trans]
output: "{lang}_translation/{trans}"
shell: relative_symlink_cmd
I am sure there ary more 'pythonic' ways to achieve what I wanted, but this is just a quick example to illustrate my problem.
Running the above workflow with snakemake
, I get the following error:
Building DAG of jobs...
MissingInputException in line 33 of /tmp/snakemake.test/Snakefile
Missing input files for rule translation:
english/vanilla
So while Snakemake is clever enough to create the english/<flavour>
directories when attempting to make an english/<flavour>/<voter>
file, it seems to 'forget' about the existence of this directory when using it as an input to make a <language>_translation/<flavour>
symlink.
As an intermediate step, I applied the following patch to the Snakefile:
27c27
< input: votes, translations
---
> input: votes#, translations
Now, the workflow ran through and created the english
directory as expected (snakemake -q
output only):
Job counts:
count jobs
1 all
6 english
7
Now with the target directories created, I went back to the initial version of the Snakefile and re-ran it:
Job counts:
count jobs
1 all
6 translation
7
ImproperOutputException in line 33 of /tmp/snakemake.test/Snakefile
Outputs of incorrect type (directories when expecting files or vice versa). Output directories must be flagged with directory(). for rule translation:
french_translation/chocolat
Exiting because a job execution failed. Look above for error message
While I am not sure if a symlink to a directory qualfies as a directory, I went ahead and applied a new patch to follow the suggestion:
35c35
< output: "{lang}_translation/{trans}"
---
> output: directory("{lang}_translation/{trans}")
With that, snakemake
finally created the symlinks:
Job counts:
count jobs
1 all
6 translation
7
As a confirmation, here is the resulting directory structure:
english
├── berry
│ └── windu
├── chocolate
│ ├── han
│ ├── luke
│ └── vader
└── vanilla
├── leia
└── yoda
french_translation
├── baie -> ../english/berry
├── chocolat -> ../english/chocolate
└── vanille -> ../english/vanilla
german_translation
├── beere -> ../english/berry
├── schokolade -> ../english/chocolate
└── vanille -> ../english/vanilla
9 directories, 6 files
However, besides not being able to create this structure without running snakemake
twice (and modifying the targets in between), even simply re-running the workflow results in an error:
Building DAG of jobs...
ChildIOException:
File/directory is a child to another output:
/tmp/snakemake.test/english/berry
/tmp/snakemake.test/english/berry/windu
running the translation rules again for no (good) reason:
Job counts:
count jobs
1 all
5 translation
6
So my question is: How can I implement the above logic in a working Snakefile?
Note that I am not looking for advice to change the data representation in the YAML file and/or the Snakefile. This is just an example to highlight (and isolate) an issue I encountered in a more complex scenario.
Sadly, while I could not figure this out by myself so far, I managed to get a working GNU make version (even though the 'YAML parsing' is hackish at best):
### Setup ###
configfile := config.yaml
### Targets ###
votes := $(shell awk ' \
NR == 1 { next } \
/^[^ ]/ { exit } \
NF == 1 { sub(":", "", $$1); dir = "english/" $$1 "/"; next } \
{ print dir $$2 } \
' '$(configfile)')
translations := $(shell awk ' \
NR == 1 { next } \
/^[^ ]/ { trans = 1; next } \
! trans { next } \
{ sub(":", "", $$1) } \
NF == 1 { dir = $$1 "_translation/"; next } \
{ print dir $$1 } \
' '$(configfile)')
### Commands ###
create_file_cmd = touch '$@'
create_dir_cmd = mkdir --parent '$@'
relative_symlink_cmd = ln --symbolic --relative '$<' '$@'
### Rules ###
all : $(votes) $(translations)
$(sort $(dir $(votes) $(translations))) : % :
$(create_dir_cmd)
$(foreach vote, $(votes), $(eval $(vote) : | $(dir $(vote))))
$(votes) : % :
$(create_file_cmd)
translation_targets := $(shell awk ' \
NR == 1 { next } \
/^[^ ]/ { trans = 1; next } \
! trans { next } \
NF != 1 { print "english/" $$2 "/"} \
' '$(configfile)')
define translation
$(word $(1), $(translations)) : $(word $(1), $(translation_targets)) | $(dir $(word $(1), $(translations)))
$$(relative_symlink_cmd)
endef
$(foreach i, $(shell seq 1 $(words $(translations))), $(eval $(call translation, $(i))))
Running make
on this works just fine:
mkdir --parent 'english/chocolate/'
touch 'english/chocolate/vader'
touch 'english/chocolate/luke'
touch 'english/chocolate/han'
mkdir --parent 'english/vanilla/'
touch 'english/vanilla/yoda'
touch 'english/vanilla/leia'
mkdir --parent 'english/berry/'
touch 'english/berry/windu'
mkdir --parent 'french_translation/'
ln --symbolic --relative 'english/chocolate/' 'french_translation/chocolat'
ln --symbolic --relative 'english/vanilla/' 'french_translation/vanille'
ln --symbolic --relative 'english/berry/' 'french_translation/baie'
mkdir --parent 'german_translation/'
ln --symbolic --relative 'english/chocolate/' 'german_translation/schokolade'
ln --symbolic --relative 'english/vanilla/' 'german_translation/vanille'
ln --symbolic --relative 'english/berry/' 'german_translation/beere'
The resulting tree is identical to the one shown above.
Also, running make
again works as well:
make: Nothing to be done for 'all'.
So I really hope the solution is not to go back to old-fashioned GNU make with all the unreadable hacks I internalized over the years but that there is a way to convince Snakemake as well to do what I spelled out to do. ;-)
Just in case it is relevant: This was tested using Snakemake version 5.7.132.2.
edits:
relative_symlink_cmd
as per @Nick's comment.I wanted to test with a newer version of Snakemake (5.20.1), and I came up with something similar to the answer proposed by Manalavan Gajapathy:
### Setup ###
configfile: "config.yaml"
VOTERS = list({voter for flavour in config["flavours"].keys() for voter in config["flavours"][flavour]})
### Targets ###
votes = ["english/" + flavour + "/" + voter
for flavour, voters in config["flavours"].items()
for voter in voters]
translations = {language + "_translation/" + translation
for language, translations in config["translations"].items()
for translation in translations.keys()}
### Commands ###
create_file_cmd = "touch '{output}'"
relative_symlink_cmd = "ln --symbolic --relative $(dirname '{input}') '{output}'"
### Rules ###
rule all:
input: votes, translations
rule english:
output: "english/{flavour}/{voter}"
# To avoid considering ".done" as a voter
wildcard_constraints:
voter="|".join(VOTERS),
shell: create_file_cmd
def get_voters(wildcards):
return [f"english/{wildcards.flavour}/{voter}" for voter in config["flavours"][wildcards.flavour]]
rule flavour:
input: get_voters
output: "english/{flavour}/.done"
shell: create_file_cmd
rule translation:
input: lambda wc: "english/" + config["translations"][wc.lang][wc.trans] + "/.done"
output: directory("{lang}_translation/{trans}")
shell: relative_symlink_cmd
This runs and creates the desired output, but fails with ChildIOException
when re-run (even if there would be nothing more to be done).