Search code examples
rfor-looplogistic-regression

Problem with for-loop with character list in r


I need to run multiple logistic regression with only one predictor changing each time (other covariates remains the same). I tried to use a for-loop in R.

I receive this error message that I don't understand:

Error in model.frame.default(formula = VD ~ v + covar1 + covar2 + covar3, : 
variable lengths differ (found for 'v')

Here is my code (in reality, I have more than 3 variables to model, but let's say I have only 3):

var_list <- c("var1", "var2", "var3")

mydata$covar3 <- factor(mydata$covar3)

for (v in var_list) {
  
  results <<- glm (VD ~ v + covar1 + covar2 + covar3, 
                data = mydata, family = "binomial")
  
}

I checked that the length of each variable was the same (it was!).

Only covar3 is categorical; covar1 and covar2 are numeric variables.

When I run the same code outside the loop, say for var1 only, everything works fine. How does my loop create this problem ?

Thanks in advance for your help


Solution

  • As an alternative to the answer that I posted with my close vote, to loop through a list of independent variables we can build an expression with the required variable, then evaluate it as part of an apply() function to generate the regression models.

    We'll use mtcars with lm() for a reproducible example, but the same approach will work with glm().

    # list of independent variables
    indepVars <- c("cyl","wt","am")
    
    # use eval() to access the value of x within the formula
    
    theModels <- lapply(indepVars,function(x,dfname){
         lm(paste("mpg ~ ",eval(x)," + disp"),data = dfname) 
    },mtcars)
    
    
    # assign names to list
    names(theModels) <- indepVars
    
    # print the first model
    summary(theModels[["cyl"]])
    

    The tricky part is that we use eval() to evaluate the independent variable name passed as x to lapply() within a paste() function that creates an expression. Then we use that expression within an lm() function.

    ...and the output:

    > summary(theModels[["cyl"]])
    
    Call:
    lm(formula = paste("mpg ~ ", eval(x), " + disp"), data = dfname)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -4.4213 -2.1722 -0.6362  1.1899  7.0516 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept) 34.66099    2.54700  13.609 4.02e-14 ***
    cyl         -1.58728    0.71184  -2.230   0.0337 *  
    disp        -0.02058    0.01026  -2.007   0.0542 .  
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 3.055 on 29 degrees of freedom
    Multiple R-squared:  0.7596,    Adjusted R-squared:  0.743 
    F-statistic: 45.81 on 2 and 29 DF,  p-value: 1.058e-09