Search code examples
rtidyverselinear-regressioncoefficients

Using linear regression coefficients from table to compute values


I have a file with many variable names and coefficients. The task is to use those variable names and coefficients to create a linear regression formula and apply it to data. Here's a small example:

coefs <- tibble(varname = c("(Intercept)", "dxaids", "abnormal_bun"),
                coef = c(-3.1, 0.1, 0.2))

data <- tibble(dxaids = c(0,0,1), abnormal_bun = c(1,0,0))

The goal is a new column, effectively

data %>% mutate(y = -3.1 + 0.1 * dxaids + 0.2 * abnormal_bun)

What I've done for the time being is manually write out the equation with about 25 variables.

Of course I can write an ugly loop for this, shown below, but is there any cleaner way with tidyverse tools? Perhaps this can be accomplished with a single matrix-vector multiply, but dplyr doesn't seem amenable to matrix operations.

y <- as.numeric(coefs[coefs$varname == "(Intercept)", "coef"])

for (i in 1:nrow(coefs)) {
  varname <- as.character(coefs[i,"varname"])
  coef <- as.numeric(coefs[i,"coef"])
  if (varname != "(Intercept)") 
    y <- y + coef * data[,varname] 
}

Solution

  • You can avoid using a for loop if you use matrix multiplication:

    coefs$coef[1] + (as.matrix(data) %*% coefs$coef[-1])
         [,1]
    [1,] -2.9
    [2,] -3.1
    [3,] -3.0
    

    Just make sure columns in data correspond with order in coefs$coef[-1]. Example, if columns in data do not match coef order, then simply you can reorder data using:

    data <- data[, 2:1] # note the order is chaged
    coefs$coef[1] + (as.matrix (data[, coefs$varname[-1]]) %*% coefs$coef[-1])
         [,1]
    [1,] -2.9
    [2,] -3.1
    [3,] -3.0