I am trying to find model with lowest AIC. Models are returned from two for loops that make possible combinations of columns. I am unable to make the function return model with lowest AIC. The code below demonstrates where I got stuck:
rm(list = ls())
data <- iris
data <- data[data$Species %in% c("setosa", "virginica"),]
data$Species = ifelse(data$Species == 'virginica', 0, 1)
mod_headers <- names(data[1:ncol(data)-1])
f <- function(mod_headers){
for(i in 1:length(mod_headers)){
tab <- combn(mod_headers,i)
for(j in 1:ncol(tab)){
tab_new <- c(tab[,j])
mod_tab_new <- c(tab_new, "Species")
model <- glm(Species ~., data=data[c(mod_tab_new)], family = binomial(link = "logit"))
}
}
best_model <- model[which(AIC(model)[order(AIC(model))][1])]
print(best_model)
}
f(mod_headers)
Any suggestions? Thanks!
glm() uses an iterative re-weighted least squares algorithm. The algorithm reaches the maximum number of iterations before it converges - changing this parameter helps in your case:
glm(Species ~., data=data[mod_tab_new], family = binomial(link = "logit"), control = list(maxit = 50))
There was another issue using which
, I replaced it with an if
after each model fit to compare to the lowest AIC so far. However, I think there are better solutions than this for-loop
approach.
f <- function(mod_headers){
lowest_aic <- Inf # added
best_model <- NULL # added
for(i in 1:length(mod_headers)){
tab <- combn(mod_headers,i)
for(j in 1:ncol(tab)){
tab_new <- tab[, j]
mod_tab_new <- c(tab_new, "Species")
model <- glm(Species ~., data=data[mod_tab_new], family = binomial(link = "logit"), control = list(maxit = 50))
if(AIC(model) < lowest_aic){ # added
lowest_aic <- AIC(model) # added
best_model <- model # added
}
}
}
return(best_model)
}