Search code examples
rloopsdataframeiteratormodeling

Loop though a data frame and create a model for each column which is evaluated on the same testing set


I have a data frame with 20 plus columns, and for each of these columns I would like to create a glm model which I then evaluate on the same testing set. Here is my attempt:

# Train-test splitting
smp_size <- floor(0.70 * nrow(x))
index <- sample(seq_len(nrow(x)),size = smp_size)
train <- x[index, ]
test <- x[-index, ]

for (i in 1:22) {

   names(train)[names(train) == names(train[i])] <- 'variab'
   names(test)[names(test) == names(test[i])] <- 'variab'

   mod <- glm(Y ~ variab, family = binomial, data = train)

  assign(paste0("val", sep = "_", letters[i]), as.numeric(performance(
    prediction(predict(mod, newdata = test, type = "response"),test$Y), 
    measure = "auc")@y.values[[1]]))
}

However this doesn't work, it just assigns the name "variab" to each column and ends up running the same model for each column. How can I make this loop iterate though each column in the data frame?


Solution

  • Here's an idea for you. I hope this meets your needs. I don't know where your performance() or prediction() functions came from so I removed them from my example.

    data(iris)
    predictors <- names(iris)[-1]
    response <- names(iris)[1]
    
    # due to a ill chosen example data:
    iris[,response] <- iris[,response]/max(iris[,response])
    
    # sample
    smp_size <- floor(.7*nrow(iris))
    set.seed(20171212)
    idx <- sample(seq_len(nrow(iris)), size=smp_size)
    train <- iris[idx,]
    test <- iris[-idx,]
    
    
    for (i in predictors) {
      tmp.test <- data.frame(pred=get(i,test), resp=get(response, test))
      tmp.train <- data.frame(pred=get(i,train), resp=get(response, train))
    
    
      mod <- glm(resp ~ pred, family=binomial, data=tmp.train)
    
      assign(paste0("val", sep="_", i), data.frame(predicted=as.numeric(predict(mod, newdata=tmp.test, type="response")), actual=get(response,test)))
      }
    

    Basically, it's what you already did. You were already using the assign() function, and I think of get() as its complement and equally useful. I'm also a proponent of not using numerical indexes when possible and iterating through names when I use a loop because it's both simple and easy to write effective cat() messages.