In R I would code:
library(tidyverse)
library(lubridate)
df<-data.frame(mydate=rep(as.Date("2022-01-01")+0:90, each=10),
mygroup=sample(c("A", "B", "C"), size=910, replace=T))
df %>%
mutate(mymonth=round_date(mydate, "month")) %>%
ggplot(aes(x=mymonth, fill=mygroup))+
geom_bar(position = "dodge")
I am learning Python\Pandas\Altair\Streamlit, and want to make a Streamlit dashboard with a similar chart to the above. I suspect there is a more efficient way to make produce this chart. Here is my current best effort:
import streamlit as st #version 1.14.0
import pandas as pd #version 1.5.1
import altair as alt #version 4.2.0
df=pd.DataFrame({'mydate':pd.date_range(start='1/1/2020', end='4/09/2020').repeat(10),
'mygroup':pd.Series(["A", "B", "C"]).sample(n=1000, replace=True)})
c = alt.Chart(df).transform_window(
sort=[{'field': 'yearmonth(mydate)'}],
cumulative_count='count(*)', # incorrect count
).mark_bar().encode(
x='yearmonth(mydate):O',
y='cumulative_count:Q',
#xOffset='mygroup:N', #error
color='mygroup'
)
st.altair_chart(c, use_container_width=True)
The chart is currently a stacked bar chart, and I would like to make a grouped bar.
I found xOffset
property via https://stackoverflow.com/a/72092979/10276092 - However I get the message SchemaValidationError: Additional properties are not allowed ('xOffset' was unexpected)
The bars themselves look like they have horizontal banding. I suspect this is because I am not summarizing (in the transform_window
?) before displaying.
As I am post this, I notice the calculated variable cumulative_count is incorrect.
Solution was multi-step. First, using dt.strftime('%Y-%m-15')
to bring all the dates together per month (I choose the 15th, because the 1st was rounding down into the previous month. Bug?).
Then groupby
and count
to summarize the data. .reset_index
created a nice dataframe, and rename(columns=
named the summary count field properly.
Within the chart, set the width and height of the chart, and the size
(previously barsize
) of the bars. the facet=
parameter makes the stacked barchart into a grouped barchart. Finally the configure_header
cleans the labiling.
import streamlit as st #version 1.14.0
import pandas as pd #version 1.5.1
import altair as alt #version 4.2.0
df=pd.DataFrame({'mydate':pd.date_range(start='1/1/2020', end='4/09/2020').repeat(10),
'mygroup':pd.Series(["A", "B", "C"]).sample(n=1000, replace=True)})
df['mydate2'] = df['mydate'].dt.strftime('%Y-%m-15')
df2 = df.groupby(by=['mydate2', 'mygroup']).count().reset_index().rename(columns={'mydate': 'counts'})
c = alt.Chart(df2, width=75, height=200).mark_bar(size=20).encode(
x='mygroup:N',
y='counts:Q',
facet='month(mydate2):O',
color='mygroup:N'
).configure_header(labelOrient='bottom',
labelPadding = 3).configure_facet(spacing=5
)
st.altair_chart(c) #, use_container_width=True)