Search code examples
rcallmodelingglmnames

Perform model selection with step function and write outcome into 0/1 vector (R)


I am trying to write an algorithm which does the following in R:

  1. On a data set dat use step function to perform glm model selection of j covariates from a set of J candidate variables
  2. Take final call of j variates and compare with full vector J. Write outcome into a 1xJ vector, where 1 indicates variable is in final call and 0 elsewise.

Example:

In the following example three variables (x,y,z) are candidates for prediction of variable dep. Step is used for variable selection. My goal is to finally have a vector indicating which of the input variables ends up in the final model, so here, c(1,0,1).

n=1000
x <- rnorm(n,0,1)
y <- rnorm(n,0,1)
z <- rnorm(n,0,1)

dep <- 1 + 2 * x + 3* z + rnorm(n,0,1)

m<-step(lm(dep~x+y+z),direction="backward")

I have difficulties extracting the variable names from the final m$call and creating the vector.


Solution

  • I think this does it:

    n=1000
    
    x <- rnorm(n,0,1)
    y <- rnorm(n,0,1)
    z <- rnorm(n,0,1)
    
    dep <- 1 + 2*x + 3*z + rnorm(n,0,1)
    
    m<-step(lm(dep~x+y+z),direction="backward")
    
    matt <- attributes(m$terms)
    matt$term.labels
    #[1] "x" "z"
    
    v <- c("x","y","z")
    as.integer(v %in% matt$term.labels)
    #[1] 1 0 1