Search code examples
pythonaltair

Altair syntax for condition involving aggregate field


I am trying to create a condition involving an aggregate field. For this example dataset

df=pd.DataFrame([['game1','player1',2,1],['game1','player2',3,4],['game1','player3',2,2]
                ,['game2','player1',0,3],['game2','player2',4,4],['game2','player3',3,3]]
                ,columns=['game','player','score1','score2']) 
color={'condition':[{"value":"green","test":"datum.score2 > datum.score1"}
                   ,{"value":"yellow","test":"datum.score2 == datum.score1"}
                   ,{"value":"red","test":"datum.score2 < datum.score1"}]}
alt.Chart(df).mark_point().encode(x='score2',y='player',color=color)

I get this chart:

enter image description here

But if I wanted to have a chart displaying only the average for each player, I couldn't figure out a syntax that worked for the condition.

alt.Chart(df).mark_point().encode(x='mean(score2)',y='player',color=color)

I tried:

"test":mean(datum.score2) > mean(datum.score1)"

and

"test":"datum.mean(score2) > datum.mean(score1)"

None of them worked. I couldn't find any syntax directions in the documentation.


Solution

  • mean() is a shorthand in Altair that is available in encoding fields and transforms but not directly in conditions. To use the mean values in a condition, you need to create new columns for the mean values in a separate step via transform_aggregate (here we use transform_joinaggregate since you want to plot the original values in your dataframe and not the aggregated values):

    color={
        'condition': [
            {"value":"green", "test": "datum.mean_score2 > datum.mean_score1"},
            {"value":"yellow", "test": "datum.mean_score2 == datum.mean_score1"},
            {"value":"red", "test": "datum.mean_score2 < datum.mean_score1"}
        ]
    }
    
    alt.Chart(df).mark_point().encode(
        x='score2',
        y='player',
        color=color
    ).transform_joinaggregate(
        mean_score1='mean(score1)',
        mean_score2='mean(score2)',
        groupby=['player']
    )
    

    enter image description here

    If you want to plot the mean values, it would look like this:

    alt.Chart(df).mark_point().encode(
        x='mean_score2:Q',
        y='player',
        color=color
    ).transform_aggregate(
        mean_score1='mean(score1)',
        mean_score2='mean(score2)',
        groupby=['player']
    )
    

    enter image description here