Search code examples
pythonpandasnumpycorrelationaltair

Adding R-value (correlation) to scatter chart in Altair


So I am playing around with the Cars dataset and am looking to add the R-value to a scatter chart. So I can use this code to produce a scatter chart using transform_regression to add a regression line which is great.

from vega_datasets import data
import altair as alt
import pandas as pd
import numpy as np

cars = data.cars()
chart = alt.Chart(cars).mark_circle().encode(
        alt.X('Miles_per_Gallon', scale=alt.Scale(domain=(5,50))),
        y='Weight_in_lbs'
)

chart + chart.transform_regression('Miles_per_Gallon','Weight_in_lbs').mark_line()

Here is the chart

enter image description here

Then I am looking get the R-value. So can use pandas with this code as I am not sure how to get the R-value with Altair.

corl = cars[['Miles_per_Gallon','Weight_in_lbs']].corr().iloc[0,1]
corl

Now I was wondering how would I go about adding the R-value on the chart as a sort of label?


Solution

  • You can do this by adding a text layer:

    text = alt.Chart({'values':[{}]}).mark_text(
        align="left", baseline="top"
    ).encode(
        x=alt.value(5),  # pixels from left
        y=alt.value(5),  # pixels from top
        text=alt.value(f"r: {corl:.3f}"),
    )
    
    chart + text + chart.transform_regression('Miles_per_Gallon','Weight_in_lbs').mark_line()
    

    enter image description here

    In future versions of Altair, the empty data in the chart will no longer be required.