I have a rather huge set of predictors to do a model-selection on, for the sake of simplicity this example will however work with the mtcars
dataset.
I am somewhat reluctant to take the dredge()
-approach, since I want to increase my awareness over which predictors are being chosen and why.
I wrote a function to go through all predictors (via update()
) for one step of selection to be able to evaluate their effect size (estimate) and p-value. The output is supposed to be a data.frame where I can see everything rather quickly.
It seems kind of awkward to me. When running the for-loop - after defining model
and variables
(and everything around it) manually - everything seems to be fine and the expected output (see below) is produced. When running the function, update/eval/parse
does not seem to recognize that variables
has been defined and is thus not able to compute the updated model.
function body
find_effects <- function(model, variables=NULL){
require(car)
estim <- NULL; p_val <- NULL; stars <- NULL; names <- NULL
for(i in 1:length(variables)){
original <- model
if(is.null(variables)){
cat("'variables' is NULL")
} else {
cat(paste("'variables' is not NULL, it has ", length(variables), " elements"))
}
updated <- update(model, . ~ . + eval(parse(text=variables[i])))
if(length(rownames(summary(original)$coef))==length(rownames(summary(updated)$coef))){
next
}
df.summary <- data.frame(estim=summary(updated)$coef[,1], names=rownames(summary(updated)$coef))
df.Anova <- data.frame(p_val=Anova(updated, type="III")[,3], names=rownames(Anova(updated, type="III")))
estim <- c(estim, df.summary$estim[df.summary$names=="eval(parse(text = variables[i]))"])
p_val <- c(p_val, df.Anova$p_val[df.Anova$names=="eval(parse(text = variables[i]))"])
stars <- symnum(p_val, corr = FALSE,
cutpoints = c(0, .001, .01, .05, .1, 1),
symbols = c("***", "**", "*", ".", " "))
names <- c(names, variables[i])
}
df <- noquote(cbind(Estimate=format(estim), "Pr(>Chisq)"=format(p_val), " "=stars))
rownames(df) <- names
return(df)
}
library(lme4)
mpgs <- lmer(mpg ~ hp + (1|cyl), data=mtcars, REML=F)
find_effects(mpgs, colnames(mtcars[c(3, 5:11)]))
expected output
Estimate Pr(>Chisq)
disp -0.02942650 7.331371e-05 ***
drat 4.69815776 3.449956e-05 ***
wt -3.78335019 4.098811e-10 ***
qsec -0.98805796 1.505408e-02 *
vs 0.04636047 9.798979e-01
am 5.00527293 1.506875e-06 ***
gear 3.08916264 5.346748e-05 ***
carb 0.33074245 6.094443e-01
the formula for the update when you call the function will be something like this:
mpg ~ hp + (1|cyl) + eval(parse(text = variables[i]))
The expression is evaluated only after being passed to the update function, where the variables object is not defined. Hence the error.
create the formula and then pass it to the update function.
updated <- paste(". ~ . + ", variables[i], sep="")
updated <- update(model, updated)