Search code examples
rloopsglmlogistic-regression

vglm: how to loop with variables strings in model?


I've found a handy solution in order code a loop calling vglm() but only with a single variable:

R: varlist
[1] "X2"  "X7"  "X17" "X18" "X33"
models <- lapply(varlist, function(x) {
  vglm(substitute(class ~ i, list(i = as.name(x))), data = train.data[c(-1)], family = multinomial())
})

Since I want to perform variable selection with AIC, I run the following

resAIC = lapply(models, AIC)
R: resAIC
[[1]]
[1] 11918.26

[[2]]
[1] 11917.55

[[3]]
[1] 11919.45

[[4]]
[1] 11926.03

[[5]]
[1] 11923.2

Now, for the next vglm call I have to update variable list for AIC, it's now:

R: varlist
[1] "X18+X2"  "X18+X7"  "X18+X17" "X18+X33"

And I get the following error when calling models <- lapply(varlist, function(x) { vglm(...)}) again.

Error in eval(expr, envir, enclos) : object 'X18+X2' not found
Called from: eval(expr, envir, enclos)

How should I modify the code in order to be more general and accept "X2", "X18 + X2", "X18 + X2 + X33", etc. when calling vglm() respectively.

Thanks in advance


Solution

  • Rather than messing around with building a formula dynamically, i might suggest subsetting the columns of your data.frame and not bothering with building strings with pluses.

    #SAMPLE DATA
    train.data<-data.frame(class=sample(1:5, 50, replace=T), 
        matrix(runif(50*12), ncol=12))
    
    library(VGAM)
    varlist <- list("X2", c("X8","X2"), c("X8","X2","X11"))
    models <- lapply(varlist, function(x) {
        vglm(class ~ ., data = train.data[, c("class", x)], family = multinomial())
    })
    

    You can extrtact the AIC and terms with

    sapply(models, AIC)
    sapply(models, function(x) attr(terms(x), "term.labels"))