Search code examples
altairvega-lite

Get altair to save only columns used in the chart while exporting it as a JSON


Is there a canonical way to get an altair chart to just save the columns in the dataframe that it uses rather than all columns of the dataframe? One option is to subset the dataframe when creating the chart but this ends up being quite tedious (Not subsetting increases the size of the charts quite a bit).

For example

import altair as alt
from vega_datasets import data

source = data.cars().loc[:2,:]

fg = alt.Chart(source).mark_circle(size=60).encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
    tooltip=['Name', 'Horsepower', 'Miles_per_Gallon']
).interactive()

fg.save("figure.json")

gives us

{"config": {"view": {"continuousWidth": 400, "continuousHeight": 300}}, 
"data": {"name": "data-152dfe631924b6c7160a15e490a9028e"}, 
"mark": {"type": "circle", "size": 60}, 
"encoding": {"color": {"field": "Origin", "type": "nominal"}, 
"tooltip": [{"field": "Name", "type": "nominal"}, {"field": "Horsepower", "type": "quantitative"}, {"field": "Miles_per_Gallon", "type": "quantitative"}], 
"x": {"field": "Horsepower", "type": "quantitative"}, 
"y": {"field": "Miles_per_Gallon", "type": "quantitative"}}, 
"selection": {"selector006": {"type": "interval", "bind": "scales", "encodings": ["x", "y"]}}, 
"$schema": "https://vega.github.io/schema/vega-lite/v4.17.0.json", 
"datasets": {"data-152dfe631924b6c7160a15e490a9028e": 
[{"Name": "chevrolet chevelle malibu", "Miles_per_Gallon": 18.0, "Cylinders": 8, "Displacement": 307.0, "Horsepower": 130.0, "Weight_in_lbs": 3504, "Acceleration": 12.0, "Year": "1970-01-01T00:00:00", "Origin": "USA"}, 
{"Name": "buick skylark 320", "Miles_per_Gallon": 15.0, "Cylinders": 8, "Displacement": 350.0, "Horsepower": 165.0, "Weight_in_lbs": 3693, "Acceleration": 11.5, "Year": "1970-01-01T00:00:00", "Origin": "USA"}, 
{"Name": "plymouth satellite", "Miles_per_Gallon": 18.0, "Cylinders": 8, "Displacement": 318.0, "Horsepower": 150.0, "Weight_in_lbs": 3436, "Acceleration": 11.0, "Year": "1970-01-01T00:00:00", "Origin": "USA"}]}}

where columns like Displacement and Cylinders are saved even though they are not used in the chart.


Solution

  • The only way to do this automatically is to use VegaFusion, which prunes any unused columns from the spec https://vegafusion.io/#quickstart-1-overcome-maxrowserror