Search code examples
pythonplotbar-chartaltair

Grouped bar charts in Altair using two different columns


TLDR; How do I make a grouped bar chart in the most recent version of Altair where the grouped bars come from different columns of quantitative data, as opposed to one column of categorical data?

While I've found some great answers on here about creating grouped bar charts in Altair (like this one), none answer my specific question.

I have a table with multiple columns, two of which are quantitative and represent two different values that could be grouped into one category (e.g. 'cm_of_rain' and 'cm_of_snow' can be summed and called something like 'cm_of_precipitation'), one is the months as ordinal strings, and another is the day as a number. So a dataframe of the data would look something like this:

data = {'Month':['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar', 'Apr', 'Apr'], 
        'Day': [1, 15, 1, 15, 1, 15, 1, 15],
        'cm_of_rain':[20, 21, 19, 18, 1, 12, 33, 12], 
        'cm_of_snow':[0, 2, 6, 3, 4, 2, 5 ,11]}
 
df = pd.DataFrame(data)
print(df)

 Month  Day  cm_of_rain  cm_of_snow
   Jan    1          20           0
   Jan   15          21           2
   Feb    1          19           6
   Feb   15          18           3
   Mar    1           1           4
   Mar   15          12           2
   Apr    1          33           5
   Apr   15          12          11

I want to make a bar plot where the data is grouped by month on the X axis and cm of precipitation is shown on the Y-axis, but rather than having a stacked bar plot where rain and snow are additive, I want to plot the two values as side-by-side bars for each month. So the result should look something like the grouped bar plot from the post linked above

Example of a grouped bar chart, taken from this linked StackOverflow post.

except Genre ("Action", "Crime") would be replaced by Month ("Jan", "Feb", "Mar", "Apr"), Gender (F, M) would be replaced by Precipitation_Type (rain, snow), and Rating would be replaced by Precipitation_(cm).

For context, the main difference between my question and the ones asked by others before, is that the data I want to group together is from two different columns of quantitative data in my dataframe, whereas every other post I've seen uses some sort of categorical data from a single column.


Solution

  • What you have is usually referred to as "wide form" or "untidy" data. Altair generally works better with "long form" or "tidy data". You can read more about how to convert between the two in the documentation, but one way would be to use transform_fold.

    import altair as alt
    import pandas as pd
    
    
    data = {'Month':['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar', 'Apr', 'Apr'], 
            'Day': [1, 15, 1, 15, 1, 15, 1, 15],
            'rain':[20, 21, 19, 18, 1, 12, 33, 12], 
            'snow':[0, 2, 6, 3, 4, 2, 5 ,11]}
     
    df = pd.DataFrame(data)
    
    alt.Chart(df).mark_bar().encode(
        x='amount (cm):Q',
        y='type:N',
        color='type:N',
        row=alt.Row('Month', sort=['Jan', 'Feb', 'Mar', 'Apr'])
    ).transform_fold(
        as_=['type', 'amount (cm)'],
        fold=['rain', 'snow']
    )
    

    enter image description here