Search code examples
pythonannotationsfacetaltair

Annotate altair facet plots with metrics


I have the following code snippet to generate data and plot the scatters for two facet dimensions (['train','test'], ['type_a','type_b']) as columns and rows.

import altair as alt
import numpy as np
import pandas as pd
from scipy.stats import pearsonr

np.random.seed(0)
df = pd.DataFrame(data=np.random.randn(1000, 1), columns=['A'])
df['B'] = df['A'] + np.random.rand(1000)
df['subset'] = 'test'
df.loc[:500, 'subset'] = 'train'
df['type'] = 'type_a'
df.loc[300:700, 'type'] = 'type_b'

r = df.groupby(['subset', 'type']).apply(lambda x: pearsonr(x['A'], x['B'])[0])
r.name = 'correlation'
r = pd.DataFrame(r)
print(r)

alt.Chart(df).mark_point().encode(x='A', y='B', column='subset', row='type')

enter image description here Now I want to annotate each facet subplot with the pearson's correlation as calculated using the groupby in pandas. enter image description here

Is there any way to put this in the upper corner of each panel or even to the title (except for illustrator)?

Thanks! Max


Solution

  • You can see how you can include a text annotation here Altair: Extract and display regression coefficients. In your case the issue will be that you have two different dataframes and I believe that it is not possible to have two different frames layered in a facet (someone correct me if this is wrong). You could work around this by merging into one frame first:

    import altair as alt
    import numpy as np
    import pandas as pd
    from scipy.stats import pearsonr
    
    
    np.random.seed(0)
    df = pd.DataFrame(data=np.random.randn(1000, 1), columns=['A'])
    df['B'] = df['A'] + np.random.rand(1000)
    df['subset'] = 'test'
    df.loc[:500, 'subset'] = 'train'
    df['type'] = 'type_a'
    df.loc[300:700, 'type'] = 'type_b'
    
    r = df.groupby(['subset', 'type']).apply(lambda x: pearsonr(x['A'], x['B'])[0])
    r.name = 'correlation'
    r = pd.DataFrame(r)
    
    points = alt.Chart(df.merge(r.reset_index())).mark_point().encode(x='A', y='B')
    text = points.mark_text(align='left').encode(
        x=alt.value(20),  # pixels from left
        y=alt.value(20),  # pixels from top
        text='mean(correlation):N'  # taking the mean to reduce to a single value
    )
    
    (text + points).facet(column='subset', row='type')
    

    enter image description here

    You could probably create a more complex string using transform_calculate with the vega expression strings https://vega.github.io/vega/docs/expressions/.