Search code examples
pythonpandasdataframebokeh

How do I work up my pandas dataframe to vizualize it as a stacked barchart with bokeh?


I would like to create a stacked bar chart from this data frame where the x axis is each unique date and the stacked bars are the values are drawn from each numerical value under the provider column.

Grouped data frame

When I create a pivot table, the data aggregates for the columns that have the same exact name. If I pivot with the 'provider' as the new columns, then this makes 5 columns and 14 rows. The issue is bokeh vbar_stack does not accept different columns and rows. There must be the same number of columns and rows. However, I cannot get the pivot table made without the data aggregating.

can I transform this data and use the bokeh package to create a stacked bar chart?

Code:

pivot_df = grouped_df.pivot_table(index=['date'], columns='provider', values='num_youths', aggfunc='first', fill_value=0)

pivot_df.reset_index(inplace=True)

source = ColumnDataSource(pivot_df)

providers = pivot_df.columns[1:]

# Create the figure
p = figure(x_range=pivot_df['date'].unique(), plot_height=350, title="Number of Youths Funded by Provider Each Month",
           toolbar_location=None, tools="")

# Add stacked bars to the figure
p.vbar_stack(stackers=providers, x='date', width=0.9, color=["blue", "red"], source=source,
             legend_label=providers)

Error message: ValueError: Keyword argument sequences for broadcasting must be the same length as stackers


Solution

  • You have to handle you pandas DataFrame in the correct way.

    Pandas

    In the example below is a minimal example of your data. I use groupby and unstack with a filling mode to add zeros if a not all groups have a value on each date.

    Afterwards I drop the mulit-index of the returned DataFrame.

    import pandas as pd
    
    df = pd.DataFrame({
        'date': ['Aug 23', 'Aug 23', 'Dec 23'],
        'provider': ['A', 'B', 'C'],
        'num_youths': [1, 3, 4]
        }
    )
    df
    
    >>> df
         date provider  num_youths
    0  Aug 23        A           1
    1  Aug 23        B           3
    2  Dec 23        C           4
    
    # groupby and fill with zeor
    stacked = df.groupby(['date','provider']).sum().unstack(fill_value=0)
    >>> stacked 
             num_youths      
    provider          A  B  C
    date                     
    Aug 23            1  3  0
    Dec 23            0  0  4
    
    # drop multi index for columns and index
    stacked.columns = stacked.columns.droplevel()
    provider = list(stacked.columns)
    stacked = stacked.reset_index()
    

    To get the data bokeh wants, you have to call to_dict with orient="list".

    data = stacked.to_dict(orient='list')
    

    bokeh

    The data has the corect format, so just call figure() and vbar_stack. The most of this code comes from the stacked bar example from the docs.

    from bokeh.plotting import figure, show, output_notebook
    from bokeh.palettes import HighContrast3
    output_notebook()
    
    p = figure(x_range=data['date'], height=250, 
               toolbar_location=None, tools="hover", tooltips="@date $name @$name")
    
    p.vbar_stack(provider, x='date', width=0.9, color=HighContrast3, source=data,
                 legend_label=provider)
    
    show(p)
    

    stacked bar plot