Search code examples
scatter-plotaltair

Altair scatter plot: Instead of size = count, can I have size = percent or proportion or something?


I'm making a scatter plot of how satisfied people are, by age:

alt.Chart(df).mark_circle().encode(
    alt.X('d2_age', bin = True),
    alt.Y('3_satisfied'),
    size = 'count()'
)

But, instead of having the size of each mark be an absolute count, I want to have it be the proportion or percent of that age range. So instead of seeing that 300 people in their 60s rated their satisfaction at a 7, you'd see that 50% of people in their 60s did.


Solution

  • Yes this is possible to do, but for this kind of calculation you must do your binning and aggregation via transforms rather than via encoding shortcuts.

    Here is an example of the type of chart you're asking about:

    import altair as alt
    import pandas as pd
    import numpy as np
    
    rng = np.random.RandomState(1)
    
    df = pd.DataFrame({
        'd2_age': rng.normal(40, 10, 100),
        '3_satisfied': rng.randint(1, 11, 100)
    })
    
    alt.Chart(df).transform_bin(
        'd2_age_binned', field='d2_age'
    ).transform_joinaggregate(
        total='count()',
        groupby=['d2_age_binned']
    ).transform_joinaggregate(
        in_group='count()',
        groupby=['d2_age_binned', '3_satisfied']
    ).transform_calculate(
        percentage=alt.datum.in_group / alt.datum.total
    ).mark_circle().encode(
        alt.X('d2_age_binned:Q', bin='binned'),
        alt.Y('3_satisfied'),
        alt.Size('percentage:Q', legend=alt.Legend(format='%', title='Percent in agegroup'))
    )
    

    enter image description here