Context Following the docs, it is possible to add a regression line to a vega plot i.e.
{
"data": {
"url": "data/movies.json"
},
"layer": [
{
"mark": {
"type": "point",
"filled": true
},
"encoding": {
"x": {
"field": "Rotten Tomatoes Rating",
"type": "quantitative"
},
"y": {
"field": "IMDB Rating",
"type": "quantitative"
}
}
},
{
"mark": {
"type": "line",
"color": "firebrick"
},
"transform": [
{
"regression": "IMDB Rating",
"on": "Rotten Tomatoes Rating"
}
],
"encoding": {
"x": {
"field": "Rotten Tomatoes Rating",
"type": "quantitative"
},
"y": {
"field": "IMDB Rating",
"type": "quantitative"
}
}
},
{
"transform": [
{
"regression": "IMDB Rating",
"on": "Rotten Tomatoes Rating",
"params": true
},
{"calculate": "'R²: '+format(datum.rSquared, '.2f')", "as": "R2"}
],
"mark": {
"type": "text",
"color": "firebrick",
"x": "width",
"align": "right",
"y": -5
},
"encoding": {
"text": {"type": "nominal", "field": "R2"}
}
}
]
}
Question Here, the R^2 value is exposed through 'datum.rSquared' but how can I access the fitting parameters and other related information?
I couldn't find this in the documentation or elsewhere and now that vega has upped its security I couldn't work out how to access the 'keys' of 'datum' using some kind of javascript. Any information would be much appreciated and bonus points if someone can suggest a good debugging practice for this situation.
According to documentation for the Vega Regression transform, the options "params" and "as" can be specified for Vega to output parameters and coordinates to plot the regrssion lines:
"params": A boolean flag indicating if the transform should return the fit model parameters (one object per group), rather than trend line points. The resulting objects include a coef array of fitted coefficient values (starting with the intercept term and then including terms of increasing order) and an rSquared value (indicating the total variance explained by the model).
"as": The output fields for the predictor and predicted values for the line of best fit. If unspecified, the x and y parameter field names will be used.
Using the Vega example for regression, here is a working example of how to get coefficients and R-squared values.
Apparantly Vega will output either "params" or "as" in a regression transform -- but not both. As a workaround, we can get the coefficients and R-squared by adding another dataset ("trend_params") that is a duplicate of "trend", but with the property "params": true.
The Vega online editor's tab "DATA VIEWER" shows data fields and data values for each dataset. The "trend" dataset has values to plot the regession line(s), whereas "trend_params" has fields coef (array of coefficients) and rSquared.
For linear regression of all data points, the y-intercept of the fitted line can be accessed in dataset "trend_params" by signal expression datum.coef[0] or datum['coef'][0] and the slope of the line by signal expression datum['coef'][1]
Try the differnt options for regression models and grouping, and observe the plotted chart and also the data in datasets "trend" and "trend_params".
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"description": "A scatter plot with trend line calculated via user-configurable regression methods.",
"padding": 5,
"width": 500,
"height": 500,
"autosize": "pad",
"signals": [
{
"name": "method", "value": "linear",
"bind": {"input": "select", "options": [
"linear", "log", "exp", "pow", "quad", "poly"
]}
},
{
"name": "polyOrder", "value": 3,
"bind": {"input": "range", "min": 1, "max": 3, "step": 1}
},
{
"name": "groupby", "value": "none",
"bind": {"input": "select", "options": ["none", "genre"]}
}
],
"data": [
{
"name": "movies",
"url": "data/movies.json",
"transform": [
{
"type": "filter",
"expr": "datum['Rotten Tomatoes Rating'] != null && datum['IMDB Rating'] != null"
}
]
},
{
"name": "trend",
"source": "movies",
"transform": [
{
"type": "regression",
"groupby": [{"signal": "groupby === 'genre' ? 'Major Genre' : 'foo'"}],
"method": {"signal": "method"},
"order": {"signal": "polyOrder"},
"extent": {"signal": "domain('x')"},
"x": "Rotten Tomatoes Rating",
"y": "IMDB Rating",
"as": ["u", "v"]
}
]
},
{
"name": "trend_params",
"source": "movies",
"transform": [
{
"type": "regression",
"groupby": [{"signal": "groupby === 'genre' ? 'Major Genre' : 'foo'"}],
"method": {"signal": "method"},
"order": {"signal": "polyOrder"},
"extent": {"signal": "domain('x')"},
"x": "Rotten Tomatoes Rating",
"y": "IMDB Rating",
"params": true
}
]
}
],
"scales": [
{
"name": "x",
"type": "linear",
"domain": {"data": "movies", "field": "Rotten Tomatoes Rating"},
"range": "width"
},
{
"name": "y",
"type": "linear",
"domain": {"data": "movies", "field": "IMDB Rating"},
"range": "height"
}
],
"axes": [
{
"title": "Rotten Tomatoes Rating",
"orient": "bottom",
"scale": "x",
"grid": true,
"tickCount": 5
},
{
"title": "IMDB Rating",
"orient": "left",
"scale": "y",
"grid": true,
"tickCount": 5
}
],
"marks": [
{
"type": "symbol",
"from": {"data": "movies"},
"encode": {
"enter": {
"x": {"scale": "x", "field": "Rotten Tomatoes Rating"},
"y": {"scale": "y", "field": "IMDB Rating"},
"fillOpacity": {"value": 0.5},
"size": {"value": 16}
}
}
},
{
"type": "group",
"from": {
"facet": {
"data": "trend",
"name": "curve",
"groupby": "Major Genre"
}
},
"marks": [
{
"type": "line",
"from": {"data": "curve"},
"encode": {
"enter": {
"x": {"scale": "x", "field": "u"},
"y": {"scale": "y", "field": "v"},
"stroke": {"value": "firebrick"}
}
}
},
{
"type": "text",
"from": {"data": "trend_params"},
"encode": {
"update": {
"text": {"signal": "['Regression method: ' + method, '.', 'Group by: ' + groupby, '.', 'Coefficients: ', datum['coef'][0], datum['coef'][1], datum['coef'][2], datum['coef'][3], '.', 'R-squared: ' + datum['rSquared'] ]" },
"x": {"value": 250},
"y": {"value": 300},
"fill": {"value": "black"},
"fillOpacity": {"value": 1.0},
"fontSize": {"value": 16}
}
}
}
]
}
]
}