Search code examples
rregressionlinear-regression

Problem with character string input for lm () in a loop


I have a for loop where I use a different independent and dependent variable every time to run a linear regression. However, the lm() function is not working because I believe I am trying to input a character string as the variable. I created this easy example, to show what the problem is. Let's assume var1, var2, and var3 are column names in the dat dataframe. I cannot input the column names directly, so I have to assign them as character strings to an R variable.

dat <- read.csv("dat.csv")

x1 <- "var1"
x2 <- "var2"
y <- "var3"

lm(y ~ x1 + x2, data = dat) #error

I know the issue here is that R tries to run lm("var3" ~ "var1" + "var2", data = dat). I need your help finding what function I should run on y, x1, and x2, so the lm() runs properly.


Solution

  • You can specify string variable names using as.formula, and pass this to lm.

    x1 <- "var1"
    x2 <- "var2"
    y <- "var3"
    
    fm <- as.formula(paste(y, "~", x1, "+", x2, sep=""))
    
    lm(fm, data = dat)