I'm using the Jupyter
extension (v2022.9.1303220346) in Visual Studio Code
(v1.73.1).
To reproduce this issue, make any modification to the notebook and check it into git. You'll observe that you get an extra difference for execution_count
. For example (display from Git Gui
):
- "execution_count": 7,
+ "execution_count": 9,
The execution count doesn't appear to be useful and is noise in the git history. Can Jupyter or VS Code be configured to stop updating this value or (better) ignore it altogether?
Can Jupyter or VS Code be configured to stop updating this value or (better) ignore it altogether?
I'm not sure about VS Code, and I think the answer for VS Code config options might be no after reading some discussions in GitHub feature-request issue tickets for Jupyter notebooks, where the fact that they are feature-requests indicates to me that the answer also currently seems to be no, but also that there are plenty of approaches to tackling the problem:
In jupyter/notebook
: Suggestion: Separate file for notebook executed cell outputs. #5677
I think it would be nice to have a separate file (something like .ipynb.output) that links output to their cells in the .ipynb json file. This would make it significantly easier to exclude notebook outputs in source control systems like git. - jbursey
Its not a bad idea. But if keeping cell output out of source control is your primary concern, the easiest solution is to just clear the outputs before committing. There are a few ways to do that:
Use a commit hook as outlined in Jupyter docs.
Use Jupyter's shortcut to "clear all cell output"
Use nbconvert to clear the notebook outputs before committing.
You could also just write your own shell script to clear outputs. I wrote one using jq to do that and it is fairly easy.
Some folks also choose to just convert the notebook to python using nbconvert and then just commit that. If you search for "How to version control jupyter notebooks" you will see a bunch of posts on the topic.
Alternatively, Jupytext could be helpful for your case. It allows you to save notebooks as code. Then you only need to commit the code to git, whilst you can ignore the notebooks for version control.
Their paired notebooks avoid the need for automatically saving and converting the notebooks.
In jupyterlab/jupyterlab
: Using a notebook & git creates too many diff #9444
It would be much simpler if we had an option to save only the input cells, not the output ones. And to reset the cell index (execution_count) to 0 without restarting the kernel. - sylvain-bougnoux
I think that you can configure the underlying nbdiff to ignore outputs, see: https://nbdime.readthedocs.io/en/latest/config.html#configuring-ignores - krassowski
In jupyterlab/jupyterlab-git
: Cleaning Notebook cell outputs #392
Notebooks cell outputs can be a hindrance in Version Control while reviewing the diff of a commit to see what changed (either in a PR or historically)
Some ideas on how we could enable users to deal with outputs in cell in jupyterlab-git
- Enable a Command Palette option to easily install a Git filter with nbstripout
- Prompt the user to remove outputs from cells if we detect that there are cell outputs during a git push
- Use the JupyterLab settings registry to let the user specify that all Notebook outputs must be cleaned on a git push
With #700, it is now possible to add nbstripout (for example) when initializing a git repository. - fcollonval
For your learning purposes / reference, I found this info by googling "github issues jupyter notebook put execution_count in separate file
" and looking through the top search results and linked GitHub issues in their discussion threads.
Someone in the issue ticket mentioned the extension "paired notebooks" which allows pairing a text notebook with a ipynb file, with the intention that the text notebook be used for version control. I have no affiliation with this extension and have not tried it. Just mentioning it in case you find it useful.