Search code examples
rfor-loopnaivebayes

Returning a column to use in for loop for naive-bayes in R


I'm doing a naive-bayes algorithm in R. The main goal is to predict a variable's value. But in this specific task, I'm trying to see which column is better at predicting it. This is an example of what works (but in the real dataset doing it manually isn't an option):

   library(naivebayes)
   data("mtcars")
   mtcars$vsLog <- as.logical(as.integer(mtcars$vs))
   mtcars_train <- mtcars[1:20,]
   mtcars_test <- mtcars[20:32,]
   car_model <- naive_bayes( data=mtcars_train, vsLog ~ mpg )
   predictions <- predict(car_model,mtcars_test)

What I'm having trouble with is performing a for loop, in which the model takes one column at a time, and save how good each model did at predicting the values. I've looked at different ways to input the columns as something I can iterate over, but couldn't make it work. My minimum reproducible example of my problem is:

library(naivebayes)
data("mtcars")
mtcars$vsLog <- as.logical(as.integer(mtcars$vs))
mtcars_train <- mtcars[1:20,]
mtcars_test <- mtcars[20:32,]

for (j in 1:ncol(mtcars)) {
car_model <- naive_bayes( data=mtcars_train, vsLog ~ mtcars_train[,j] )
predictions[j] <- predict(car_model,mtcars_test)
}

The problem is how to replace mpg in the first example with something I can loop over. Things I've tried: mtcars_train$mpg , unlist( mtcars_train[,j] ) , colnames . I really tried googling this, I hope it's not too silly of a question.

Thanks for reading


Solution

  • This might be helpful. If you want to use a for loop, you can use seq_along with the names of your columns you want to loop through in your dataset. You can use reformulate to create a formula, which would you vsLog in your example, as well as the jth item in your column names. In this example, you can store your predict results in a list. Perhaps this might translate to your real dataset.

    pred_lst <- list()
    
    mtcars_names <- names(mtcars_train)
    
    for (j in seq_along(mtcars_names)) {
      car_model <- naive_bayes(reformulate(mtcars_names[j], "vsLog"), data=mtcars_train)
      pred_lst[[j]] <- predict(car_model, mtcars_test)
    }