I have the following code snippet to generate data and plot the scatters for two facet dimensions (['train','test'], ['type_a','type_b']
) as columns and rows.
import altair as alt
import numpy as np
import pandas as pd
from scipy.stats import pearsonr
np.random.seed(0)
df = pd.DataFrame(data=np.random.randn(1000, 1), columns=['A'])
df['B'] = df['A'] + np.random.rand(1000)
df['subset'] = 'test'
df.loc[:500, 'subset'] = 'train'
df['type'] = 'type_a'
df.loc[300:700, 'type'] = 'type_b'
r = df.groupby(['subset', 'type']).apply(lambda x: pearsonr(x['A'], x['B'])[0])
r.name = 'correlation'
r = pd.DataFrame(r)
print(r)
alt.Chart(df).mark_point().encode(x='A', y='B', column='subset', row='type')
Now I want to annotate each facet subplot with the pearson's correlation as calculated using the groupby in pandas.
Is there any way to put this in the upper corner of each panel or even to the title (except for illustrator)?
Thanks! Max
You can see how you can include a text annotation here Altair: Extract and display regression coefficients. In your case the issue will be that you have two different dataframes and I believe that it is not possible to have two different frames layered in a facet (someone correct me if this is wrong). You could work around this by merging into one frame first:
import altair as alt
import numpy as np
import pandas as pd
from scipy.stats import pearsonr
np.random.seed(0)
df = pd.DataFrame(data=np.random.randn(1000, 1), columns=['A'])
df['B'] = df['A'] + np.random.rand(1000)
df['subset'] = 'test'
df.loc[:500, 'subset'] = 'train'
df['type'] = 'type_a'
df.loc[300:700, 'type'] = 'type_b'
r = df.groupby(['subset', 'type']).apply(lambda x: pearsonr(x['A'], x['B'])[0])
r.name = 'correlation'
r = pd.DataFrame(r)
points = alt.Chart(df.merge(r.reset_index())).mark_point().encode(x='A', y='B')
text = points.mark_text(align='left').encode(
x=alt.value(20), # pixels from left
y=alt.value(20), # pixels from top
text='mean(correlation):N' # taking the mean to reduce to a single value
)
(text + points).facet(column='subset', row='type')
You could probably create a more complex string using transform_calculate
with the vega expression strings https://vega.github.io/vega/docs/expressions/.