Search code examples
pythonpandasstreamlitaltair

How to summarise data to make a grouped bar chart in Altair / Streamlit


In R I would code:

library(tidyverse)
library(lubridate)
df<-data.frame(mydate=rep(as.Date("2022-01-01")+0:90, each=10),
               mygroup=sample(c("A", "B", "C"), size=910, replace=T))
df %>% 
  mutate(mymonth=round_date(mydate, "month")) %>% 
  ggplot(aes(x=mymonth, fill=mygroup))+
  geom_bar(position = "dodge")

enter image description here

I am learning Python\Pandas\Altair\Streamlit, and want to make a Streamlit dashboard with a similar chart to the above. I suspect there is a more efficient way to make produce this chart. Here is my current best effort:

import streamlit as st #version 1.14.0
import pandas as pd    #version 1.5.1
import altair as alt   #version 4.2.0

df=pd.DataFrame({'mydate':pd.date_range(start='1/1/2020', end='4/09/2020').repeat(10),
                 'mygroup':pd.Series(["A", "B", "C"]).sample(n=1000, replace=True)})

c = alt.Chart(df).transform_window(
  sort=[{'field': 'yearmonth(mydate)'}],
  cumulative_count='count(*)', # incorrect count
).mark_bar().encode(
  x='yearmonth(mydate):O',
  y='cumulative_count:Q',
  #xOffset='mygroup:N', #error
  color='mygroup'
)
st.altair_chart(c, use_container_width=True)

enter image description here

The chart is currently a stacked bar chart, and I would like to make a grouped bar.

I found xOffset property via https://stackoverflow.com/a/72092979/10276092 - However I get the message SchemaValidationError: Additional properties are not allowed ('xOffset' was unexpected)

The bars themselves look like they have horizontal banding. I suspect this is because I am not summarizing (in the transform_window?) before displaying.

As I am post this, I notice the calculated variable cumulative_count is incorrect.


Solution

  • Solution was multi-step. First, using dt.strftime('%Y-%m-15') to bring all the dates together per month (I choose the 15th, because the 1st was rounding down into the previous month. Bug?).

    Then groupby and count to summarize the data. .reset_index created a nice dataframe, and rename(columns= named the summary count field properly.

    Within the chart, set the width and height of the chart, and the size (previously barsize) of the bars. the facet= parameter makes the stacked barchart into a grouped barchart. Finally the configure_header cleans the labiling.

    import streamlit as st #version 1.14.0
    import pandas as pd    #version 1.5.1
    import altair as alt   #version 4.2.0
    
    df=pd.DataFrame({'mydate':pd.date_range(start='1/1/2020', end='4/09/2020').repeat(10),
      'mygroup':pd.Series(["A", "B", "C"]).sample(n=1000, replace=True)})
    
    df['mydate2'] = df['mydate'].dt.strftime('%Y-%m-15')
    df2 = df.groupby(by=['mydate2', 'mygroup']).count().reset_index().rename(columns={'mydate': 'counts'})
    
    c = alt.Chart(df2, width=75, height=200).mark_bar(size=20).encode(
      x='mygroup:N',
      y='counts:Q',
      facet='month(mydate2):O',
      color='mygroup:N'
    ).configure_header(labelOrient='bottom',
                        labelPadding = 3).configure_facet(spacing=5
     )
    st.altair_chart(c) #, use_container_width=True)
    

    enter image description here

    Thanks - https://stackoverflow.com/a/58739328/10276092