I need to run multiple logistic regression with only one predictor changing each time (other covariates remains the same). I tried to use a for-loop in R.
I receive this error message that I don't understand:
Error in model.frame.default(formula = VD ~ v + covar1 + covar2 + covar3, :
variable lengths differ (found for 'v')
Here is my code (in reality, I have more than 3 variables to model, but let's say I have only 3):
var_list <- c("var1", "var2", "var3")
mydata$covar3 <- factor(mydata$covar3)
for (v in var_list) {
results <<- glm (VD ~ v + covar1 + covar2 + covar3,
data = mydata, family = "binomial")
}
I checked that the length of each variable was the same (it was!).
Only covar3 is categorical; covar1 and covar2 are numeric variables.
When I run the same code outside the loop, say for var1 only, everything works fine. How does my loop create this problem ?
Thanks in advance for your help
As an alternative to the answer that I posted with my close vote, to loop through a list of independent variables we can build an expression with the required variable, then evaluate it as part of an apply()
function to generate the regression models.
We'll use mtcars
with lm()
for a reproducible example, but the same approach will work with glm()
.
# list of independent variables
indepVars <- c("cyl","wt","am")
# use eval() to access the value of x within the formula
theModels <- lapply(indepVars,function(x,dfname){
lm(paste("mpg ~ ",eval(x)," + disp"),data = dfname)
},mtcars)
# assign names to list
names(theModels) <- indepVars
# print the first model
summary(theModels[["cyl"]])
The tricky part is that we use eval()
to evaluate the independent variable name passed as x
to lapply()
within a paste()
function that creates an expression. Then we use that expression within an lm()
function.
...and the output:
> summary(theModels[["cyl"]])
Call:
lm(formula = paste("mpg ~ ", eval(x), " + disp"), data = dfname)
Residuals:
Min 1Q Median 3Q Max
-4.4213 -2.1722 -0.6362 1.1899 7.0516
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.66099 2.54700 13.609 4.02e-14 ***
cyl -1.58728 0.71184 -2.230 0.0337 *
disp -0.02058 0.01026 -2.007 0.0542 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.055 on 29 degrees of freedom
Multiple R-squared: 0.7596, Adjusted R-squared: 0.743
F-statistic: 45.81 on 2 and 29 DF, p-value: 1.058e-09