Search code examples
rpowerbipowerquerym

How to get linear regression model parameters with R transformation step in Power Query


Say we have two variables Y and X and we want to make a simple regression model with Transform > Run R script in Power Query. Here we have sample data:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSAeJYnWglYyDLCMwyBbKMwSwLIMtEKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Y = _t, X = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Y", Int64.Type}, {"X", Int64.Type}})
in
    #"Changed Type"

Then I add the Run R script step to Power Query with the following code:

output <- lm(Y ~ X, data=dataset)

enter image description here

But I get an empty table:

enter image description here

How to modify R script to get the model summary?


Solution

  • To my knowledge, in order to retrieve any output from your R script, it has to be in the form of a data.frame. However, if you try to run something like df<-data.frame(output), you'll get the error

    Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = > > stringsAsFactors) : cannot coerce class ‘"summary.lm"’ to a data.frame

    But you can retrieve parts of that very same summary and organize it in a dataframe. Building on your snippet, here's how you can do it for the model coefficients. I've taken the libery to rename your output to model.

    Code:

    model <- lm(Y ~ X, dataset)
    df<- data.frame(coef(model))
    names(df)[names(df)=="coef.model."] <- "coefficients"
    df['variables'] <- row.names(df)
    

    Output 1:

    enter image description here

    Next, just click Table to get:

    Output 2:

    enter image description here

    Since I don't have your data, I've just used Enter Data and this simple data sample:

    enter image description here

    Edit: Other model estimates

    If you'd like to retrieve estimated series from the model such as residuals or fitted values, just add the following lines to the snippet above:

    df_estimates <- data.frame(fitted(model), residuals(model))
    colnames(df_estimates) <- c('fitted', 'residuals')
    

    The preliminary output will now be:

    enter image description here

    And this is a pretty hande feature in Power BI. Just click on table next to the dataframe you'd like to keep working with. In the latter case, you'll get this:

    enter image description here

    Edit 2 - Include original dataset in your output:

    #Y <- c(1,2,3,4,4)
    #X <- c(1,2,3,4,5)
    
    #dataset <- data.frame(X, Y)
    
    model <- lm(Y ~ X, dataset)
    df<- data.frame(coef(model))
    names(df)[names(df)=="coef.model."] <- "coefficients"
    df['variables'] <- row.names(df)
    df_estimates <- data.frame(dataset$X, dataset$Y, fitted(model), residuals(model))
    colnames(df_estimates) <- c('X', 'Y', 'fitted', 'residuals')
    df_estimates