Say we have two variables Y and X and we want to make a simple regression model with Transform > Run R script
in Power Query. Here we have sample data:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSAeJYnWglYyDLCMwyBbKMwSwLIMtEKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Y = _t, X = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Y", Int64.Type}, {"X", Int64.Type}})
in
#"Changed Type"
Then I add the Run R script
step to Power Query with the following code:
output <- lm(Y ~ X, data=dataset)
But I get an empty table:
How to modify R script to get the model summary?
To my knowledge, in order to retrieve any output from your R script, it has to be in the form of a data.frame
. However, if you try to run something like df<-data.frame(output)
, you'll get the error
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = > > stringsAsFactors) : cannot coerce class ‘"summary.lm"’ to a data.frame
But you can retrieve parts of that very same summary and organize it in a dataframe. Building on your snippet, here's how you can do it for the model coefficients. I've taken the libery to rename your output
to model
.
Code:
model <- lm(Y ~ X, dataset)
df<- data.frame(coef(model))
names(df)[names(df)=="coef.model."] <- "coefficients"
df['variables'] <- row.names(df)
Output 1:
Next, just click Table
to get:
Output 2:
Since I don't have your data, I've just used Enter Data
and this simple data sample:
Edit: Other model estimates
If you'd like to retrieve estimated series from the model such as residuals or fitted values, just add the following lines to the snippet above:
df_estimates <- data.frame(fitted(model), residuals(model))
colnames(df_estimates) <- c('fitted', 'residuals')
The preliminary output will now be:
And this is a pretty hande feature in Power BI. Just click on table
next to the dataframe you'd like to keep working with. In the latter case, you'll get this:
Edit 2 - Include original dataset in your output:
#Y <- c(1,2,3,4,4)
#X <- c(1,2,3,4,5)
#dataset <- data.frame(X, Y)
model <- lm(Y ~ X, dataset)
df<- data.frame(coef(model))
names(df)[names(df)=="coef.model."] <- "coefficients"
df['variables'] <- row.names(df)
df_estimates <- data.frame(dataset$X, dataset$Y, fitted(model), residuals(model))
colnames(df_estimates) <- c('X', 'Y', 'fitted', 'residuals')
df_estimates