The context:
My team has a repo with lots of jupyter notebooks in it. We often work with sensitive data in jupyter notebooks, and we want to make sure that we don't accidentally commit any data outputs of jupyter notebook cells to git.
To make that work, I added this filter to git config:
git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'
This filter is then called in a repo-level gitattributes file like this: *.ipynb filter=strip-notebook-output
This works great, we add and commit jupyter notebooks as usual without needing to remember to strip out cell outputs each time.
The problem:
Sometimes we want to keep cell outputs in a specific notebook and add it to git index with all cell outputs. Essentially, we want to be able to override the gitattributes filter in specific cases, while having the filter run by default all other times. Is this possible? If yes, how can I implement it?
I've tried googling how to do this, and haven't found an answer so far.
I could just remove the git attributes filter and tell my team to always run nbconvert with clearing outputs enabled each time on each notebook file before adding to git, except when they deliberately want to keep cell outputs, but it is a very risky option because people are likely to forget or make mistakes (some of my team members are new to git and version control). I am hoping to find a solution that allows to have the notebook cell outputs cleared by default before/while adding to git index, and in exceptional cases allows to add to git notebooks with cell outputs intact.
Try
git -c filter.strip-notebook-output.clean= add path/to/a/notebook
The clean command is disabled in this git add
, so that the content is added as is. All cells are affected. If you want some of the cells to be intact and the others to be cleaned, this method cannot work.