I have a jupyter notebook which has ~400 cells. The total file size is 8MB so I'd like to suppress the output cells that have a large size so as to reduce the overall file size.
There are quite a few possible output cells that could be causing this (mainly matplotlib and seaborn plots) so to avoid spending time on trial and error, is there a way of finding the size of each output cell? I'd like to keep as many output plots as possible as I'll be pushing the work to github for others to see.
My idea with nbformat to iterate on the cells in your notebook and check which one had the larger base64 storage spelled out with code for running in a cell in a Jupyter notebook cell to get the code cell numbers listed largest to smallest (it will fetch a notebook example first to have something to try it on; after establishing it works, substitute that file name after placing this script alongside your own .ipynb
notebook file):
############### Get test notebook ########################################
import os
notebook_example = "matplotlib3d-scatter-plots.ipynb"
if not os.path.isfile(notebook_example):
!curl -OL https://raw.githubusercontent.com/fomightez/3Dscatter_plot-binder/master/matplotlib3d-scatter-plots.ipynb
### Use nbformat to get estimate of output size from code cells. #########
import nbformat as nbf
ntbk = nbf.read(notebook_example, nbf.NO_CONVERT)
size_estimate_dict = {}
for cell in ntbk.cells:
if cell.cell_type == 'code':
size_estimate_dict[cell.execution_count] = len(str(cell.outputs))
out_size_info = [k for k, v in sorted(size_estimate_dict.items(), key=lambda item: item[1],reverse=True)]
out_size_info
(To have a place to easily run that code go here and click on the launch binder
button. When the session spins up, open a new notebook and paste in the code and run it. Static form of the notebook is here.)
Example I tried didn't include Plotly, but it seemed to do similar using a notebook with all Plotly plots. I don't know how it will handle a mix though. It may not sort perfectly if different kinds.
Hopefully, this gives you an idea though how to do what you wondered. The code example could be further expanded to use the retrieved size estimates to have nbformat make a copy of the input notebook without the output showing for, say, the top ten largest code cells.