Search code examples
pythongitvisual-studio-codejupyter-notebook

How can I configure my tools to ignore or prevent updates to the execution_count field in a Jupyter Notebook


I'm using the Jupyter extension (v2022.9.1303220346) in Visual Studio Code (v1.73.1).

To reproduce this issue, make any modification to the notebook and check it into git. You'll observe that you get an extra difference for execution_count. For example (display from Git Gui):

-   "execution_count": 7,
+   "execution_count": 9,

The execution count doesn't appear to be useful and is noise in the git history. Can Jupyter or VS Code be configured to stop updating this value or (better) ignore it altogether?


Solution

  • Can Jupyter or VS Code be configured to stop updating this value or (better) ignore it altogether?

    I'm not sure about VS Code, and I think the answer for VS Code config options might be no after reading some discussions in GitHub feature-request issue tickets for Jupyter notebooks, where the fact that they are feature-requests indicates to me that the answer also currently seems to be no, but also that there are plenty of approaches to tackling the problem:

    • In jupyter/notebook: Suggestion: Separate file for notebook executed cell outputs. #5677

      I think it would be nice to have a separate file (something like .ipynb.output) that links output to their cells in the .ipynb json file. This would make it significantly easier to exclude notebook outputs in source control systems like git. - jbursey

      Its not a bad idea. But if keeping cell output out of source control is your primary concern, the easiest solution is to just clear the outputs before committing. There are a few ways to do that:

      Use a commit hook as outlined in Jupyter docs.

      Some folks also choose to just convert the notebook to python using nbconvert and then just commit that. If you search for "How to version control jupyter notebooks" you will see a bunch of posts on the topic.

      gitjeff05

      Alternatively, Jupytext could be helpful for your case. It allows you to save notebooks as code. Then you only need to commit the code to git, whilst you can ignore the notebooks for version control.

      Their paired notebooks avoid the need for automatically saving and converting the notebooks.

      IvoMerchiers

    • In jupyterlab/jupyterlab: Using a notebook & git creates too many diff #9444

      It would be much simpler if we had an option to save only the input cells, not the output ones. And to reset the cell index (execution_count) to 0 without restarting the kernel. - sylvain-bougnoux

      I think that you can configure the underlying nbdiff to ignore outputs, see: https://nbdime.readthedocs.io/en/latest/config.html#configuring-ignoreskrassowski

    • In jupyterlab/jupyterlab-git: Cleaning Notebook cell outputs #392

      Notebooks cell outputs can be a hindrance in Version Control while reviewing the diff of a commit to see what changed (either in a PR or historically)

      Some ideas on how we could enable users to deal with outputs in cell in jupyterlab-git

      1. Enable a Command Palette option to easily install a Git filter with nbstripout
      2. Prompt the user to remove outputs from cells if we detect that there are cell outputs during a git push
      3. Use the JupyterLab settings registry to let the user specify that all Notebook outputs must be cleaned on a git push

      jaipreet-s

      With #700, it is now possible to add nbstripout (for example) when initializing a git repository. - fcollonval

    For your learning purposes / reference, I found this info by googling "github issues jupyter notebook put execution_count in separate file" and looking through the top search results and linked GitHub issues in their discussion threads.

    Someone in the issue ticket mentioned the extension "paired notebooks" which allows pairing a text notebook with a ipynb file, with the intention that the text notebook be used for version control. I have no affiliation with this extension and have not tried it. Just mentioning it in case you find it useful.