Search code examples
pythonjupyter-notebookrelative-pathsnakemake

How to load files in a notebook when using Snakemake?


In a data processing project with several steps, using Snakemake, there is a Python Jupyter Notebook in a subdirectory that processes some data:

Notebook processing_step_1/process.ipynb contains:

with open('input.csv') as infile:
    for line in infile:
        print(line)

Data file processing_step_1/input.csv contains:

one,two,three
1,2,3

And this is the Snakefile using the notebook :

rule process_data:
    input:
        "processing_step_1/input.csv",
    notebook:
        "processing_step_1/process.ipynb"

If I run the notebook interactively, or from the command line like this

jupyter nbconvert --execute --to notebook processing_step_1/process.ipynb

it works. The working directory is set to the directory of the notebook and the input file can be found with a relative path.

When running from Snakemake, though, using

snakemake -c1

I get an error message

FileNotFoundError: [Errno 2] No such file or directory: 'input.csv'

and the reason for that is that the notebook is copied and executed in a different directory, as can be seen from the Snakemake error message:

Command 'set -euo pipefail;  jupyter-nbconvert --log-level ERROR --execute  --to notebook --ExecutePreprocessor.timeout=-1 /path/to/project/.snakemake/scripts/tmp9mmr8k20.process.ipynb' returned non-zero exit status 1.

What is the canonical way of loading data files from the same directory as the notebook when using Snakemake?

I would like to still be able to use the same notebook standalone without Snakemake. So preferably I wouldn’t like to add Snakemake-specific code to it.

It seems to be impossible to find the directory containing the notebook from within the notebook. See e.g. https://stackoverflow.com/a/52119628/381281. Also I couldn’t find a way to set a working directory per rule in Snakemake.


Solution

  • The solution by @hfs (OP) is one way to resolve this, but another way is to avoid hardcoding the file paths within the notebook:

    # with open('input.csv') as infile: <- this is hard-coded
    with open(snakemake.input[0]) as infile: # this is flexible
       ...
    

    Note that for this solution to work, the notebook directive should be used instead of the shell-nbconvert combination.