Search code examples
pythonbokehstacked-bar-chartpandas-bokeh

How to Create a Dynamic Stacked Bar Chart in Bokeh with Variable Excel Inputs and Outputs


I am currently working on a data visualisation project. I need to create a stacked bar chart using Bokeh. The data is sourced from an Excel file that is updated regularly and contains multiple inputs and outputs. The structure of the data can vary, meaning that the number of inputs and outputs can change. My aim is to ensure that the Bokeh chart can adapt to these changes automatically without requiring manual code adjustments. The stacked bar chart should be able to dynamically adjust to changes in the number of inputs and outputs in the Excel file. It should be capable of visualising various combinations of fixed inputs (e.g. input 1 fixed, input 2 fixed, input 3 variable) while displaying all corresponding outputs. Ideally, the solution would automatically read the Excel file, detect the structure of the inputs and outputs, and update the chart accordingly.

I have tried to visualize the stacked bar charts for a set scenario ( 2 Inputs and 3 Outputs). in the example excel file the data was stored in a scheme like this this:

Example Scheme:

Input_1 Input_2 Output_1 Output_2 Output_3
1 1 100 200 200
2 1 150 150 200
3 1 200 100 200
1 2 200 200 100
2 2 150 200 150
3 2 100 200 200
1 3 200 100 200
2 3 200 150 150
3 3 200 200 100

It worked for this static scenario:

import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.layouts import gridplot, row
from bokeh.models import ColumnDataSource


data_frame = pd.read_excel("example.xlsx")

data_frame['Input_1'] = data_frame['Input_1'].astype(str)
data_frame['Input_2'] = data_frame['Input_2'].astype(str)

output_file("stacked_bar_charts.html")

unique_input_1 = data_frame['Input_1'].unique()
unique_input_2 = data_frame['Input_2'].unique()

plots_for_input_1 = []
plots_for_input_2 = []


for value in unique_input_1:

    filtered_data = data_frame[data_frame["Input_1"] == value]
    filtered_data = filtered_data.sort_values(by='Input_2')

    source = ColumnDataSource(filtered_data)

    plot = figure(title=f"Input_1 = {value} fixed",
                  x_range=filtered_data["Input_2"].unique(),
                  height=300,
                  width=500
                  )

    plot.xaxis.axis_label = "Input_2"

    plot.vbar_stack(stackers=["Output_1", "Output_2", "Output_3"],
                    x="Input_2",
                    width=0.9,
                    color=["orange", "gray", "brown"],
                    source=source,
                    legend_label=["Output 1", "Output 2", "Output 3"]
                    )

    plots_for_input_1.append(plot)


for value in unique_input_2:

    filtered_data = data_frame[data_frame['Input_2'] == value]
    filtered_data = filtered_data.sort_values(by='Input_1')

    source = ColumnDataSource(filtered_data)

    plot = figure(title=f"Input_2 = {value} fixed", 
                  x_range=filtered_data["Input_1"].unique(), 
                  height=300,
                  width=500
                  )

    plot.xaxis.axis_label = "Input_1"

    plot.vbar_stack(stackers=["Output_1", "Output_2", "Output_3"],
                    x="Input_1",
                    width=0.9,
                    color=["orange", "gray", "brown"],
                    source=source,
                    legend_label=["Output 1", "Output 2", "Output 3"]
                    )

    plots_for_input_2.append(plot)


grid_for_input_1 = gridplot(plots_for_input_1, ncols=1)  
grid_for_input_2 = gridplot(plots_for_input_2, ncols=1)

final_layout = row(grid_for_input_1, grid_for_input_2)


show(final_layout)

Here are output example pictures to show how I aim to visualize the data:

bokeh example plot 1:

bokeh example plot 1

bokeh example plot 2:

bokeh example plot 2

I was unable to find a dynamic approach that works for changing inputs and outputs, such as varying two inputs while keeping the others constant and visualizing the impacts on all outputs through stacked bar charts. Additionally, the approach should efficiently adapt to data updates without requiring manual code adjustments for each new scenario.


Solution

  • You can use a groupby inside a nested-loop to plot all the possible I/O combinations :

    output_file("stacked_bar_charts.html")
    
    grids = []
    for inp_col in inp_cols:
        inplots = []
        for name, sub_df in data_frame.groupby(inp_col):
            for inp_diff in inp_cols.difference([inp_col], sort=False):
                plot = figure(
                    title=TITLE(inp_col, name),
                    x_range=data_frame[inp_diff].unique(),
                    height=H, width=W,
                )
    
                plot.xaxis.axis_label = inp_diff
    
                _ = plot.vbar_stack(
                    stackers=out_cols, x=inp_diff,
                    width=BAR_WIDTH, color=COLORS,
                    source=ColumnDataSource(sub_df.sort_values(inp_diff)),
                    legend_label=out_cols.tolist(),
                )
    
                inplots.append(plot)
        grids.append(gridplot(inplots, ncols=NCOLS))
    
    final_layout = row(*grids)
    
    show(final_layout)
    

    Output (stacked_bar_charts.html) :

    enter image description here

    Used configuration :

    import pandas as pd
    from bokeh.layouts import gridplot, row
    from bokeh.models import ColumnDataSource
    from bokeh.plotting import figure, output_file, show
    
    # PD-PREPROCESS
    data_frame = pd.read_excel("example.xlsx")
    
    inp_cols = data_frame.filter(like="Input").columns
    out_cols = data_frame.filter(like="Output").columns
    
    data_frame = data_frame.astype(dict.fromkeys(inp_cols, str))
    
    # BOKEH-CONFIG
    TITLE = "{} = {} fixed".format
    COLORS = ["orange", "gray", "brown"] # depends on outputs
    BAR_WIDTH = 0.9
    H, W = 300, 500
    NCOLS = 1