Search code examples
rparsingfor-loopevalupdatemodel

R: parse and evaluate variable names within a function


I have a rather huge set of predictors to do a model-selection on, for the sake of simplicity this example will however work with the mtcars dataset. I am somewhat reluctant to take the dredge()-approach, since I want to increase my awareness over which predictors are being chosen and why.

I wrote a function to go through all predictors (via update()) for one step of selection to be able to evaluate their effect size (estimate) and p-value. The output is supposed to be a data.frame where I can see everything rather quickly.

It seems kind of awkward to me. When running the for-loop - after defining model and variables (and everything around it) manually - everything seems to be fine and the expected output (see below) is produced. When running the function, update/eval/parse does not seem to recognize that variables has been defined and is thus not able to compute the updated model.

  1. What am I doing wrong here?
  2. Any suggestions to improve the code beyond that?

function body

find_effects <- function(model, variables=NULL){
  require(car)
  estim <- NULL; p_val <- NULL; stars <- NULL; names <- NULL

  for(i in 1:length(variables)){
    original <- model
    if(is.null(variables)){
      cat("'variables' is NULL")
    } else {
      cat(paste("'variables' is not NULL, it has ", length(variables), " elements"))
    }
    updated <- update(model, . ~ . + eval(parse(text=variables[i])))
    if(length(rownames(summary(original)$coef))==length(rownames(summary(updated)$coef))){
      next
    }
    df.summary <- data.frame(estim=summary(updated)$coef[,1], names=rownames(summary(updated)$coef))
    df.Anova <- data.frame(p_val=Anova(updated, type="III")[,3], names=rownames(Anova(updated, type="III")))
    estim <- c(estim, df.summary$estim[df.summary$names=="eval(parse(text = variables[i]))"])
    p_val <- c(p_val, df.Anova$p_val[df.Anova$names=="eval(parse(text = variables[i]))"])
    stars <- symnum(p_val, corr = FALSE,
                    cutpoints = c(0, .001, .01, .05, .1, 1),
                    symbols = c("***", "**", "*", ".", " "))
    names <- c(names, variables[i])

  }
  df <- noquote(cbind(Estimate=format(estim), "Pr(>Chisq)"=format(p_val), " "=stars))
  rownames(df) <- names
  return(df)

}

library(lme4)
mpgs <- lmer(mpg ~ hp + (1|cyl), data=mtcars, REML=F)

find_effects(mpgs, colnames(mtcars[c(3, 5:11)]))

expected output

         Estimate    Pr(>Chisq)      
    disp -0.02942650 7.331371e-05 ***
    drat  4.69815776 3.449956e-05 ***
    wt   -3.78335019 4.098811e-10 ***
    qsec -0.98805796 1.505408e-02 *  
    vs    0.04636047 9.798979e-01    
    am    5.00527293 1.506875e-06 ***
    gear  3.08916264 5.346748e-05 ***
    carb  0.33074245 6.094443e-01    

Solution

  • the formula for the update when you call the function will be something like this:

    mpg ~ hp + (1|cyl) + eval(parse(text = variables[i]))
    

    The expression is evaluated only after being passed to the update function, where the variables object is not defined. Hence the error.

    create the formula and then pass it to the update function.

    updated <- paste(". ~ . + ", variables[i], sep="")
    updated <- update(model, updated)