Annotate altair facet plots with metrics

I have the following code snippet to generate data and plot the scatters for two facet dimensions (['train','test'], ['type_a','type_b']) as columns and rows.

import altair as alt
import numpy as np
import pandas as pd
from scipy.stats import pearsonr

np.random.seed(0)
df = pd.DataFrame(data=np.random.randn(1000, 1), columns=['A'])
df['B'] = df['A'] + np.random.rand(1000)
df['subset'] = 'test'
df.loc[:500, 'subset'] = 'train'
df['type'] = 'type_a'
df.loc[300:700, 'type'] = 'type_b'

r = df.groupby(['subset', 'type']).apply(lambda x: pearsonr(x['A'], x['B'])[0])
r.name = 'correlation'
r = pd.DataFrame(r)
print(r)

alt.Chart(df).mark_point().encode(x='A', y='B', column='subset', row='type')

Now I want to annotate each facet subplot with the pearson's correlation as calculated using the groupby in pandas.

Is there any way to put this in the upper corner of each panel or even to the title (except for illustrator)?

Thanks! Max

Solution

You can see how you can include a text annotation here Altair: Extract and display regression coefficients. In your case the issue will be that you have two different dataframes and I believe that it is not possible to have two different frames layered in a facet (someone correct me if this is wrong). You could work around this by merging into one frame first:

import altair as alt
import numpy as np
import pandas as pd
from scipy.stats import pearsonr


np.random.seed(0)
df = pd.DataFrame(data=np.random.randn(1000, 1), columns=['A'])
df['B'] = df['A'] + np.random.rand(1000)
df['subset'] = 'test'
df.loc[:500, 'subset'] = 'train'
df['type'] = 'type_a'
df.loc[300:700, 'type'] = 'type_b'

r = df.groupby(['subset', 'type']).apply(lambda x: pearsonr(x['A'], x['B'])[0])
r.name = 'correlation'
r = pd.DataFrame(r)

points = alt.Chart(df.merge(r.reset_index())).mark_point().encode(x='A', y='B')
text = points.mark_text(align='left').encode(
    x=alt.value(20),  # pixels from left
    y=alt.value(20),  # pixels from top
    text='mean(correlation):N'  # taking the mean to reduce to a single value
)

(text + points).facet(column='subset', row='type')

You could probably create a more complex string using transform_calculate with the vega expression strings https://vega.github.io/vega/docs/expressions/.