python bokeh stacked-bar-chart pandas-bokeh

How to Create a Dynamic Stacked Bar Chart in Bokeh with Variable Excel Inputs and Outputs

I am currently working on a data visualisation project. I need to create a stacked bar chart using Bokeh. The data is sourced from an Excel file that is updated regularly and contains multiple inputs and outputs. The structure of the data can vary, meaning that the number of inputs and outputs can change. My aim is to ensure that the Bokeh chart can adapt to these changes automatically without requiring manual code adjustments. The stacked bar chart should be able to dynamically adjust to changes in the number of inputs and outputs in the Excel file. It should be capable of visualising various combinations of fixed inputs (e.g. input 1 fixed, input 2 fixed, input 3 variable) while displaying all corresponding outputs. Ideally, the solution would automatically read the Excel file, detect the structure of the inputs and outputs, and update the chart accordingly.

I have tried to visualize the stacked bar charts for a set scenario ( 2 Inputs and 3 Outputs). in the example excel file the data was stored in a scheme like this this:

Example Scheme:

Input_1	Input_2	Output_1	Output_2	Output_3
1	1	100	200	200
2	1	150	150	200
3	1	200	100	200
1	2	200	200	100
2	2	150	200	150
3	2	100	200	200
1	3	200	100	200
2	3	200	150	150
3	3	200	200	100

It worked for this static scenario:

import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.layouts import gridplot, row
from bokeh.models import ColumnDataSource


data_frame = pd.read_excel("example.xlsx")

data_frame['Input_1'] = data_frame['Input_1'].astype(str)
data_frame['Input_2'] = data_frame['Input_2'].astype(str)

output_file("stacked_bar_charts.html")

unique_input_1 = data_frame['Input_1'].unique()
unique_input_2 = data_frame['Input_2'].unique()

plots_for_input_1 = []
plots_for_input_2 = []


for value in unique_input_1:

    filtered_data = data_frame[data_frame["Input_1"] == value]
    filtered_data = filtered_data.sort_values(by='Input_2')

    source = ColumnDataSource(filtered_data)

    plot = figure(title=f"Input_1 = {value} fixed",
                  x_range=filtered_data["Input_2"].unique(),
                  height=300,
                  width=500
                  )

    plot.xaxis.axis_label = "Input_2"

    plot.vbar_stack(stackers=["Output_1", "Output_2", "Output_3"],
                    x="Input_2",
                    width=0.9,
                    color=["orange", "gray", "brown"],
                    source=source,
                    legend_label=["Output 1", "Output 2", "Output 3"]
                    )

    plots_for_input_1.append(plot)


for value in unique_input_2:

    filtered_data = data_frame[data_frame['Input_2'] == value]
    filtered_data = filtered_data.sort_values(by='Input_1')

    source = ColumnDataSource(filtered_data)

    plot = figure(title=f"Input_2 = {value} fixed", 
                  x_range=filtered_data["Input_1"].unique(), 
                  height=300,
                  width=500
                  )

    plot.xaxis.axis_label = "Input_1"

    plot.vbar_stack(stackers=["Output_1", "Output_2", "Output_3"],
                    x="Input_1",
                    width=0.9,
                    color=["orange", "gray", "brown"],
                    source=source,
                    legend_label=["Output 1", "Output 2", "Output 3"]
                    )

    plots_for_input_2.append(plot)


grid_for_input_1 = gridplot(plots_for_input_1, ncols=1)  
grid_for_input_2 = gridplot(plots_for_input_2, ncols=1)

final_layout = row(grid_for_input_1, grid_for_input_2)


show(final_layout)

Here are output example pictures to show how I aim to visualize the data:

bokeh example plot 1:

bokeh example plot 1

bokeh example plot 2:

bokeh example plot 2

I was unable to find a dynamic approach that works for changing inputs and outputs, such as varying two inputs while keeping the others constant and visualizing the impacts on all outputs through stacked bar charts. Additionally, the approach should efficiently adapt to data updates without requiring manual code adjustments for each new scenario.

Solution

You can use a groupby inside a nested-loop to plot all the possible I/O combinations :

output_file("stacked_bar_charts.html")

grids = []
for inp_col in inp_cols:
    inplots = []
    for name, sub_df in data_frame.groupby(inp_col):
        for inp_diff in inp_cols.difference([inp_col], sort=False):
            plot = figure(
                title=TITLE(inp_col, name),
                x_range=data_frame[inp_diff].unique(),
                height=H, width=W,
            )

            plot.xaxis.axis_label = inp_diff

            _ = plot.vbar_stack(
                stackers=out_cols, x=inp_diff,
                width=BAR_WIDTH, color=COLORS,
                source=ColumnDataSource(sub_df.sort_values(inp_diff)),
                legend_label=out_cols.tolist(),
            )

            inplots.append(plot)
    grids.append(gridplot(inplots, ncols=NCOLS))

final_layout = row(*grids)

show(final_layout)

Output (stacked_bar_charts.html) :

Used configuration :

import pandas as pd
from bokeh.layouts import gridplot, row
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, output_file, show

# PD-PREPROCESS
data_frame = pd.read_excel("example.xlsx")

inp_cols = data_frame.filter(like="Input").columns
out_cols = data_frame.filter(like="Output").columns

data_frame = data_frame.astype(dict.fromkeys(inp_cols, str))

# BOKEH-CONFIG
TITLE = "{} = {} fixed".format
COLORS = ["orange", "gray", "brown"] # depends on outputs
BAR_WIDTH = 0.9
H, W = 300, 500
NCOLS = 1

Input_1	Input_2	Output_1	Output_2	Output_3
1	1	100	200	200
2	1	150	150	200
3	1	200	100	200
1	2	200	200	100
2	2	150	200	150
3	2	100	200	200
1	3	200	100	200
2	3	200	150	150
3	3	200	200	100

Input_1	Input_2	Output_1	Output_2	Output_3
1	1	100	200	200
2	1	150	150	200
3	1	200	100	200
1	2	200	200	100
2	2	150	200	150
3	2	100	200	200
1	3	200	100	200
2	3	200	150	150
3	3	200	200	100

Input_1	Input_2	Output_1	Output_2	Output_3
1	1	100	200	200
2	1	150	150	200
3	1	200	100	200
1	2	200	200	100
2	2	150	200	150
3	2	100	200	200
1	3	200	100	200
2	3	200	150	150
3	3	200	200	100