Search code examples
pythonplotly-dashjupyterdash

Python JupyterDash unable to access callback modified objects


Let's suppose that I want to make a Dash app call from a python class, within a jupyter notebook/lab, do some stuffs and finally access to the modified object. I am falling short on this last task, here a minimal working example:

import pandas as pd
import numpy as np

from flask import Flask

import dash_bootstrap_components as dbc
from dash.dependencies import Input, Output, State
from dash import html, dcc

from jupyter_dash import JupyterDash

# Initialize flask server and dash app
server = Flask(__name__)

app = JupyterDash(
    __name__,
    server=server,
    external_stylesheets=[
        dbc.themes.BOOTSTRAP,
    ],
)


class DropCol:
  
    def __init__(self, server, app, mode="inline", debug=True):
        self.mode = mode
        self.debug = debug
        self.app = app
        self.server = server
        self.callbacks(self.app)
    

    def __call__(self, df: pd.DataFrame):
        
        col_options = [{"label": c, "value": c} for c in df.columns]
        data_store = dcc.Store(id="data-store", data=df.to_json(date_format="iso", orient="split"))
        dropdown = dcc.Dropdown(id="cols", options=col_options)
        
        self.app.layout = html.Div(
            id="layout",
            children=[
                data_store,
                dropdown
            ],
        )

        self.app.run_server(mode=self.mode, debug=self.debug, port="8000")
    
    def callbacks(self, app):
        """Initialize app callbacks"""

        @app.callback(
            Output("data-store", "data"),
            Input("cols", "value"),
            State("data-store", "data"),
            prevent_initial_call=True,
        )
        def on_col_selection(col_name, df_jsonified):
            
            df = (pd.read_json(df_jsonified, orient="split")
                  .drop(col_name, axis=1)
                )
            return df.to_json(date_format="iso", orient="split")
    
    @property    
    def get_data_frame(self):
        """property to retrieve dataframe from data-store"""
        df_json = self.app.layout['data-store'].data
        return pd.read_json(df_json, orient="split")


# Sample dataframe
df = pd.DataFrame({
    "a": np.arange(10),
    "b": np.random.randn(10)
})

col_dropper = DropCol(server, app, mode="inline")
col_dropper(df)
# Select one of the column in this cell, then access the data in the next cell

col_dropper.get_data_frame.head(3)
|    |   a |         b |
|---:|----:|----------:|
|  0 |   0 |  1.0964   |
|  1 |   1 | -0.30562  |
|  2 |   2 |  1.34761  |

As you can see, the stored dataframe I am able to access has all the columns, even after I select one in the above call.

Suppose that I select column b, then the expected output is:

col_dropper.get_data_frame.head(3)
|    |   a |
|---:|----:|
|  0 |   0 |
|  1 |   1 |
|  2 |   2 |

Packages versions I am using are: dash==2.0.0, dash_bootstrap_components==1.0.1, flask==2.0.1, jupyter_dash==0.4.0


Solution

  • From what I can tell, the jupyterdash extension is just spawning a server from within jupyter. So I believe you're running up against how dash handles server state, similar to a plain old dash app. So you're essentially trying to access an updated component value outside of the server context, which is not possible (see answer here). Even though you're running from jupyter, It's still a self contained server that the client (which is you in the next jupyter cell) cannot access dynamically and so as you have it, your get_data_frame will only ever be able to access the df you instantiated with.

    To get around this, you need to store your updated dataframe in some persistent form available outside of the app. How you do that depends on your use-case but basically every time your on_col_selection is triggered, you need to write your dataframe to something outside of the app. For example, the following will reproduce the behavior you're looking for:

        def callbacks(self, app):
            """Initialize app callbacks"""
    
            @app.callback(
                Output("data-store", "data"),
                Input("cols", "value"),
                State("data-store", "data"),
                prevent_initial_call=True,
            )
            def on_col_selection(col_name, df_jsonified):
                
                df = (pd.read_json(df_jsonified, orient="split")
                      .drop(col_name, axis=1)
                    )
                df.to_csv("/path/to/some_hidden_file_somewhere.csv", index=False)
                return df.to_json(date_format="iso", orient="split")
                
        @property
        def get_data_frame(self):
            """property to retrieve dataframe from data-store"""
            return pd.read_csv("/path/to/some_hidden_file_somewhere.csv")
    

    If you're going to be sharing your code with others, you probably want something more sophisticated to keep track of user-dependent files. For example, an on-disk flask cache could work well here. Also check out Examples 3 and 4 on dash's Sharing Data Between Callbacks page.

    Depending on what your design plans are, you might also want to look into dash's DataTable for displaying and interacting with the dataframe from within the dash app.