In a data processing project with several steps, using Snakemake, there is a Python Jupyter Notebook in a subdirectory that processes some data:
Notebook processing_step_1/process.ipynb
contains:
with open('input.csv') as infile:
for line in infile:
print(line)
Data file processing_step_1/input.csv
contains:
one,two,three
1,2,3
And this is the Snakefile
using the notebook
:
rule process_data:
input:
"processing_step_1/input.csv",
notebook:
"processing_step_1/process.ipynb"
If I run the notebook interactively, or from the command line like this
jupyter nbconvert --execute --to notebook processing_step_1/process.ipynb
it works. The working directory is set to the directory of the notebook and the input file can be found with a relative path.
When running from Snakemake, though, using
snakemake -c1
I get an error message
FileNotFoundError: [Errno 2] No such file or directory: 'input.csv'
and the reason for that is that the notebook is copied and executed in a different directory, as can be seen from the Snakemake error message:
Command 'set -euo pipefail; jupyter-nbconvert --log-level ERROR --execute --to notebook --ExecutePreprocessor.timeout=-1 /path/to/project/.snakemake/scripts/tmp9mmr8k20.process.ipynb' returned non-zero exit status 1.
What is the canonical way of loading data files from the same directory as the notebook when using Snakemake?
I would like to still be able to use the same notebook standalone without Snakemake. So preferably I wouldn’t like to add Snakemake-specific code to it.
It seems to be impossible to find the directory containing the notebook from within the notebook. See e.g. https://stackoverflow.com/a/52119628/381281. Also I couldn’t find a way to set a working directory per rule in Snakemake.
The solution by @hfs (OP) is one way to resolve this, but another way is to avoid hardcoding the file paths within the notebook:
# with open('input.csv') as infile: <- this is hard-coded
with open(snakemake.input[0]) as infile: # this is flexible
...
Note that for this solution to work, the notebook
directive should be used instead of the shell
-nbconvert
combination.