I have a data frame with 20 plus columns, and for each of these columns I would like to create a glm model which I then evaluate on the same testing set. Here is my attempt:
# Train-test splitting
smp_size <- floor(0.70 * nrow(x))
index <- sample(seq_len(nrow(x)),size = smp_size)
train <- x[index, ]
test <- x[-index, ]
for (i in 1:22) {
names(train)[names(train) == names(train[i])] <- 'variab'
names(test)[names(test) == names(test[i])] <- 'variab'
mod <- glm(Y ~ variab, family = binomial, data = train)
assign(paste0("val", sep = "_", letters[i]), as.numeric(performance(
prediction(predict(mod, newdata = test, type = "response"),test$Y),
measure = "auc")@y.values[[1]]))
}
However this doesn't work, it just assigns the name "variab" to each column and ends up running the same model for each column. How can I make this loop iterate though each column in the data frame?
Here's an idea for you. I hope this meets your needs. I don't know where your performance()
or prediction()
functions came from so I removed them from my example.
data(iris)
predictors <- names(iris)[-1]
response <- names(iris)[1]
# due to a ill chosen example data:
iris[,response] <- iris[,response]/max(iris[,response])
# sample
smp_size <- floor(.7*nrow(iris))
set.seed(20171212)
idx <- sample(seq_len(nrow(iris)), size=smp_size)
train <- iris[idx,]
test <- iris[-idx,]
for (i in predictors) {
tmp.test <- data.frame(pred=get(i,test), resp=get(response, test))
tmp.train <- data.frame(pred=get(i,train), resp=get(response, train))
mod <- glm(resp ~ pred, family=binomial, data=tmp.train)
assign(paste0("val", sep="_", i), data.frame(predicted=as.numeric(predict(mod, newdata=tmp.test, type="response")), actual=get(response,test)))
}
Basically, it's what you already did. You were already using the assign()
function, and I think of get()
as its complement and equally useful. I'm also a proponent of not using numerical indexes when possible and iterating through names when I use a loop because it's both simple and easy to write effective cat()
messages.