Is there a canonical way to get an altair chart to just save the columns in the dataframe that it uses rather than all columns of the dataframe? One option is to subset the dataframe when creating the chart but this ends up being quite tedious (Not subsetting increases the size of the charts quite a bit).
For example
import altair as alt
from vega_datasets import data
source = data.cars().loc[:2,:]
fg = alt.Chart(source).mark_circle(size=60).encode(
x='Horsepower',
y='Miles_per_Gallon',
color='Origin',
tooltip=['Name', 'Horsepower', 'Miles_per_Gallon']
).interactive()
fg.save("figure.json")
gives us
{"config": {"view": {"continuousWidth": 400, "continuousHeight": 300}},
"data": {"name": "data-152dfe631924b6c7160a15e490a9028e"},
"mark": {"type": "circle", "size": 60},
"encoding": {"color": {"field": "Origin", "type": "nominal"},
"tooltip": [{"field": "Name", "type": "nominal"}, {"field": "Horsepower", "type": "quantitative"}, {"field": "Miles_per_Gallon", "type": "quantitative"}],
"x": {"field": "Horsepower", "type": "quantitative"},
"y": {"field": "Miles_per_Gallon", "type": "quantitative"}},
"selection": {"selector006": {"type": "interval", "bind": "scales", "encodings": ["x", "y"]}},
"$schema": "https://vega.github.io/schema/vega-lite/v4.17.0.json",
"datasets": {"data-152dfe631924b6c7160a15e490a9028e":
[{"Name": "chevrolet chevelle malibu", "Miles_per_Gallon": 18.0, "Cylinders": 8, "Displacement": 307.0, "Horsepower": 130.0, "Weight_in_lbs": 3504, "Acceleration": 12.0, "Year": "1970-01-01T00:00:00", "Origin": "USA"},
{"Name": "buick skylark 320", "Miles_per_Gallon": 15.0, "Cylinders": 8, "Displacement": 350.0, "Horsepower": 165.0, "Weight_in_lbs": 3693, "Acceleration": 11.5, "Year": "1970-01-01T00:00:00", "Origin": "USA"},
{"Name": "plymouth satellite", "Miles_per_Gallon": 18.0, "Cylinders": 8, "Displacement": 318.0, "Horsepower": 150.0, "Weight_in_lbs": 3436, "Acceleration": 11.0, "Year": "1970-01-01T00:00:00", "Origin": "USA"}]}}
where columns like Displacement
and Cylinders
are saved even though they are not used in the chart.
The only way to do this automatically is to use VegaFusion, which prunes any unused columns from the spec https://vegafusion.io/#quickstart-1-overcome-maxrowserror