Search code examples
pythonjupyterdask

How to export dask HTML high level graph to disk


There is a way to generate a HTML high level graph in a jupyter notebook as shown in dasks' documentation: https://docs.dask.org/en/stable/graphviz.html#high-level-graph-html-representation

Taking the example from the docs, you put the following code in a jupyter cell

import dask.array as da
x = da.ones((15, 15), chunks=(5, 5))
y = x + x.T

y.dask  # shows the HTML representation in a Jupyter notebook

And you get a nice interactive html view of the graph in the jupyter notebook.

My question is if there a way to get the html from this graph outside of the jupyter context. My immediate interest is to export a static html file to disk as a record of the graph that was executed for a task. I could also see other applications, such as embedding a widget in a gui.


Solution

  • By default, the Notebook interface will display the _repr_html_() method's output for whatever it's trying to display. In the case of a Dask Array, the dask attribute is an instance of a HighLevelGraph, whose implementation is here: https://github.com/dask/dask/blob/ed5f68897b3a097f7c5ec1a9ec13ce49c112a544/dask/highlevelgraph.py#L840

    That method should return a string, so you can call it directly and save the output to a file:

    from pathlib import Path
    Path("dask.html").write_text(y.dask._repr_html_())
    

    There are also ways to use the IPython APIs to run through the process that the Notebook kernel actually goes through when it goes to display some data, but I didn't look those up 😀