I'm trying to add a regression line and R-squared value on my scatterplot. I know I'm supposed to use the layer function and
"transform": [
{
"regression": "GDP per capita",
"on": "Educationalattainment",
}
but after trying a million times I can't figure out where to insert the line of codes. This is the code for my chart
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": {
"text": "GDP per capita and Education Attainment",
"subtitle": "From 2015-2020. Sources: World Bank",
"subtitleFontStyle": "italic",
"subtitleFontSize": 10,
"anchor": "start",
"color": "black"
},
"height": 300,
"width": 300,
"data": {
"url": "https://raw.githubusercontent.com/jamieprince/jamieprince.github.io/main/correlation.csv"
},
"transform": [
{"calculate": "datum.Educationalattainment/100", "as": "percent"},
{"filter": {
"field": "Educationalattainment",
"gt": 0
}}
],
"selection": {
"paintbrush": {
"type": "multi",
"on": "mouseover",
"nearest": true
},
"grid": {
"type": "interval",
"bind": "scales"
}
},
"mark": {
"type": "circle",
"opacity": 0.5,
"color": "#EC9D3E"
},
"encoding": {
"x": {
"field": "GDP per capita",
"type": "quantitative",
"axis": {
"title": "GDP per capita",
"grid": false,
"tickCount": 10,
"labelOverlap": "greedy"
}
},
"y": {
"field": "percent",
"type": "quantitative",
"axis": {
"title": "Educational Attainment",
"grid": false, "format":"%"
}
},
"size": {
"condition": {
"selection": "paintbrush",
"value": 300,
"init": {
"value": 70
}
},
"value": 70
},
"tooltip": [
{
"field": "Year",
"type": "nominal",
"title": "Year"
},
{
"field": "Country",
"type": "ordinal",
"title": "Country"
},
{
"field": "GDP per capita",
"type": "nominal",
"title": "GDP per capita"
},
{
"field": "Educationalattainment",
"type": "nominal",
"title": "Educational attainment at least completed short-cycle tertiary population 25+ total (%) (cumulative)"
}
]
}
}
And this is my reference chart
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "Figure 5: Plotting a regression of Social Mobility Index on Global Entrepreneurship Index, equation acquired via Python",
"data": {
"url": "https://raw.githubusercontent.com/marinabrts/marinabrts.github.io/main/GEIxSMI.csv",
"format": {"type": "csv"}
},
"background": "#E0E0E0",
"config": {"axis": {"grid": true, "gridColor": "#FFFFFF"}},
"title": {
"text": "Figure 5: Regressing SMI on Global Entrepreneurship Index",
"subtitle": "Source: World Economic Forum (2020), Global Entrepreneurship & Development Institute (2019)",
"subtitleFontStyle": "italic",
"subtitleFontSize": 10,
"anchor": "start"
},
"height": 300,
"width": 370,
"layer": [
{
"mark": {"type": "point", "size": 30, "color": "#FF3399"},
"encoding": {
"x": {
"field": "GEI",
"type": "quantitative",
"title": "Global Entrepreneurship Index (GEI)"
},
"y": {
"field": "Index Score",
"type": "quantitative",
"title": "Social Mobility Index (SMI)",
"scale": {"domain": [30, 90]}
},
"tooltip": [
{"field": "Country", "type": "nominal", "title": "Country"},
{"field": "GEI", "type": "quantitative", "title": "GEI"},
{"field": "Index Score", "type": "quantitative", "title": "SMI"}
]
}
},
{
"mark": {"type": "line", "color": "#7F00FF", "size": 3},
"transform": [{"regression": "Index Score", "on": "GEI"}],
"encoding": {
"x": {"field": "GEI", "type": "quantitative"},
"y": {"field": "Index Score", "type": "quantitative"}
}
},
{
"transform": [
{"regression": "Index Score", "on": "GEI", "params": true},
{"calculate": "'R²= '+format(datum.rSquared, '.2f')", "as": "R2"}
],
"mark": {
"type": "text",
"color": "red",
"size": 14,
"x": "width",
"align": "center",
"y": -5
},
"encoding": {"text": {"type": "nominal", "field": "R2"}}
}
]
}
I'd be so so grateful for any help. Thank you!
Edit:
Code for R-squared value
{
"transform": [
{
"regression": "GDP per capita",
"on": "percent",
"params": true
},
{"calculate": "'R²: '+format(datum.rSquared, '.2f')", "as": "R2"}
],
"mark": {
"type": "text",
"color": "black",
"x": "width",
"align": "right",
"y": -5
},
"encoding": {
"text": {"type": "nominal", "field": "R2"}
}
}
Complete chart code where the value does not appear
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": {
"text": null,
"subtitle": null,
"subtitleFontStyle": "italic",
"subtitleFontSize": 10,
"anchor": "start",
"color": "black"
},
"height": 100,
"width": 100,
"data": {
"url": "https://raw.githubusercontent.com/jamieprince/jamieprince.github.io/main/correlation.csv"
},
"transform": [
{"calculate": "datum.Educationalattainment/100", "as": "percent"},
{"filter": {"field": "Educationalattainment", "gt": 0}}
],
"layer": [
{
"selection": {
"paintbrush": {"type": "multi", "on": "mouseover", "nearest": true},
"grid": {"type": "interval", "bind": "scales"}
},
"mark": {"type": "circle", "opacity": 0.5, "color": "#EC9D3E"},
"encoding": {
"x": {
"field": "GDP per capita",
"type": "quantitative",
"axis": {
"title": "GDP per capita",
"grid": false,
"tickCount": 10,
"labelOverlap": "greedy"
}
},
"y": {
"field": "percent",
"type": "quantitative",
"axis": {
"title": "Educational Attainment",
"grid": false,
"format": "%"
}
},
"size": {
"condition": {
"selection": "paintbrush",
"value": 300,
"init": {"value": 70}
},
"value": 70
},
"tooltip": [
{"field": "Year", "type": "nominal", "title": "Year"},
{"field": "Country", "type": "ordinal", "title": "Country"},
{
"field": "GDP per capita",
"type": "nominal",
"title": "GDP per capita"
},
{
"field": "Educationalattainment",
"type": "nominal",
"title": "Educational attainment at least completed short-cycle tertiary population 25+ total (%) (cumulative)"
}
]
}
},
{
"mark": {"type": "line", "color": "#347DB6", "size": 3},
"transform": [{"regression": "GDP per capita", "on": "percent"}],
"encoding": {
"x": {"field": "GDP per capita", "type": "quantitative"},
"y": {"field": "percent", "type": "quantitative"}
}
},
{
"transform": [
{
"regression": "GDP per capita",
"on": "percent",
"params": true
},
{"calculate": "'R²: '+format(datum.rSquared, '.2f')", "as": "R2"}
],
"mark": {
"type": "text",
"color": "black",
"x": "width",
"align": "right",
"y": -5
},
"encoding": {
"text": {"type": "nominal", "field": "R2"}
}
}
]
}
You need to simply add the scatter
chart and a line
mark inside a layer which will be stacked on top of each other. Then, perform regression
transform on the line
mark. The transform which is provided in your question seems wrong, as there is no x or y field having Educationalattainment
, so I did the regression on percent
field as it was calculated and derived from the Educationalattainment
field:
"transform": [
{
"regression": "GDP per capita",
"on": "Educationalattainment",
}]
Below is the modified config or refer editor:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": {
"text": null,
"subtitle": null,
"subtitleFontStyle": "italic",
"subtitleFontSize": 10,
"anchor": "start",
"color": "black"
},
"height": 100,
"width": 100,
"data": {
"url": "https://raw.githubusercontent.com/jamieprince/jamieprince.github.io/main/correlation.csv"
},
"transform": [
{"calculate": "datum.Educationalattainment/100", "as": "percent"},
{"filter": {"field": "Educationalattainment", "gt": 0}}
],
"layer": [
{
"selection": {
"paintbrush": {"type": "multi", "on": "mouseover", "nearest": true},
"grid": {"type": "interval"}
},
"mark": {"type": "circle", "opacity": 0.5, "color": "#EC9D3E"},
"encoding": {
"x": {
"field": "GDP per capita",
"type": "quantitative",
"axis": {
"title": "GDP per capita",
"grid": false,
"tickCount": 10,
"labelOverlap": "greedy"
}
},
"y": {
"field": "percent",
"type": "quantitative",
"axis": {
"title": "Educational Attainment",
"grid": false,
"format": "%"
}
},
"size": {
"condition": {
"selection": "paintbrush",
"value": 300,
"init": {"value": 70}
},
"value": 70
},
"tooltip": [
{"field": "Year", "type": "nominal", "title": "Year"},
{"field": "Country", "type": "ordinal", "title": "Country"},
{
"field": "GDP per capita",
"type": "nominal",
"title": "GDP per capita"
},
{
"field": "Educationalattainment",
"type": "nominal",
"title": "Educational attainment at least completed short-cycle tertiary population 25+ total (%) (cumulative)"
}
]
}
},
{
"mark": {"type": "line", "color": "#347DB6", "size": 3},
"transform": [{"regression": "GDP per capita", "on": "percent"}],
"encoding": {
"x": {"field": "GDP per capita", "type": "quantitative"},
"y": {"field": "percent", "type": "quantitative"}
}
},
{
"transform": [
{"regression": "GDP per capita", "on": "percent", "params": true},
{"calculate": "'R²: '+format(datum.rSquared, '.2f')", "as": "R2"}
],
"mark": {
"type": "text",
"color": "black",
"x": "width",
"align": "right",
"y": -5
},
"encoding": {"text": {"type": "nominal", "field": "R2"}}
}
]
}
Edit
To show the text, I have removed the bind
config from your grid
selection. After removing it the text was properly visible, this can be an issue or there will be some reason behind it.
Updated the following line in above snippet:
"selection": {
"paintbrush": {"type": "multi", "on": "mouseover", "nearest": true},
"grid": {"type": "interval"}
},