Search code examples
rloopsanova

How to use nested loops to run ANOVAs with different predictors and outcomes


I have been trying to estimate multiple ANOVA's at the same time with a loop. But I want to loop through both multiple predictors and multiple outcomes. So I have been trying to do a nested loop.

#data
test<-structure(list(Alcohol = c(1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 
0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L), Smoker = c(0, 
0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1
), CXMP = c(1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 
1, 0, 1, 1, 0), CXDIAG = c(1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 
0, 1, 1, 1, 1, 1, 1, 0, 0, 1), Treatment = c(2, 2, 1, 2, 1, 0, 
2, 0, 0, 0, 2, 2, 0, 2, 0, 0, 2, 2, 1, 1, 2, 1), metformin_base = c(1L, 
1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 
1L, 0L, 0L, 1L, 0L), BMI = c(38.17, 34.14, 39.55, 49.68, 41.44, 
43.23, 41.65, 53.11, 45.04, 46.78, 52.42, 51.36, 60.7, 48.36, 
53.31, 43.29, 57.44, 53.44, 40.54, 41.2, 55.36, 33.95), Waist = c(120, 
118.5, 129.5, 144, 133.7, 121, 118.7, 139, 120.1, 131.5, 121.5, 
115, 160, 154.1, 147, 128, 134, 132.5, 118, 129, NA, NA), age = c(74.52977413, 
38.02327173, 41.08966461, 63.80013689, 22.12457221, 61.06502396, 
61.55509925, 32.47638604, 65.60438056, 68.6899384, 55.86584531, 
39.52908967, 55.69883641, 57.83709788, 52.98288843, 32.678987, 
63.43052704, 51.29637235, 52.11225188, 67.9945243, 66.7926078, 
38.80903491), charleston = c(5L, 0L, 0L, 2L, 0L, 3L, 2L, 0L, 
3L, 2L, 1L, 0L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 3L, 0L), FOOD_Fruit = c(1, 
1.5, 1, 1, NA, 1, 2, 1, NA, 2, 0, 0, 2.5, 2, 2, 2, 3.5, 2, 2, 
3, 3, 2), FOOD_Vegetable = c(3, 3.5, 2, 2, NA, 1, 1, 2, 2, 3, 
2, 0, 3, 3, 3, 3, 1.5, 2.5, 2.5, 2, 5, 5), exercisemin = c(0L, 
30L, 20L, 0L, NA, 0L, NA, 85L, NA, 0L, 0L, NA, 0L, 80L, 30L, 
10L, 60L, 0L, 0L, 0L, 15L, 60L)), row.names = c(NA, 22L), class = "data.frame")

#data transformations
catvars<-subset(test, 
 select=c(Alcohol,Smoker,CXMP,CXDIAG,Treatment,metformin_base)) #creating new 
 subset of categorical variables that does not include Charlson or BMIfactor

catvars <- catvars %>%
 mutate(across(everything(catvars), factor)) #converting the subset of categorical 
 variables into factors

contvars<-subset(test, select=c(BMI,Waist,age,charleston, 
 FOOD_Fruit,FOOD_Vegetable,exercisemin)) #creating subset of continous variables

contvars <- as.data.frame(lapply(contvars, as.numeric))

I have tried all sorts of things- running the loop with the predictors in the same dataframe, running the loop with and without paste0, running the loop with and without as.formula, running the loop with different types of loop functions, running it with different types of anova functions, etc. For the most part, My plan was to run it as a linear model, and then get the summary of results of anova.

#linear model
anovas<-for(i in colnames(contvars)) {                           
 for(j in colnames(catvars)) {
 lm(as.formula(paste0(i , "~" , j)), 
 data=cbind(contvars,catvars))
 }
}

#What I plan to use to get the summary once the loop works:
summary(aov(anovas))

The loop is what I get stuck on. No matter what I do, it throws an error. And it has thrown many types of errors- extremely large variety. I am not sure what I am doing wrong. With this syntax, the object shows up as "NULL"


Solution

  • There are a few issues here.

    • cbind() may be unsafe, data.frame() is safer
    • the results of a loop are the results of the last expression evaluated. You probably need to store the results in a list
    • there were a few typos (missing parentheses etc.)
    • you could store the results in a nested list as in @nrennie's answer, but I felt it would be easier downstream to have them stored in a single named list (using paste(i,j,sep=".") as the name; I originally did this with a k index that I incremented at each step.
    combdata <- data.frame(contvars,catvars)
    res <- list()
    for(i in colnames(contvars)) {                           
        for(j in colnames(catvars)) {
            res[[paste(i,j,sep=".")]] <- lm(as.formula(paste0(i , "~" , j)),
                           combdata)
        }
    }
    

    You could use something like sapply(res, function(x) summary(x)$r.squared) to summarize the results.