Search code examples
rfunctionautomationanovasummary

repeated two-way ANOVA on multiple columns sorted per factor level in r


I want see with a two-way ANOVA for each of the 10 environmental variables ( height, iwdo, rdos, etc.. until no2) differences among period and site. This, in three different indipendent watersheds grouped in stream.

For each stream I need to check the normality with shapiro.test and the homoscedasticity with leveneTest. After I run the model aov(nest_database[nest_database=="stream name (i.e. smeltaite)",]environmental variable (i.e.iwdo)~period*site).

So, is there a formula that can automatize such process for the three stream and at the same time being reproduced on each column of environmental variables giving me a summary for shapiro.test, leveneTest and aov results respectively?

down below the head of my dataset

nest_data<-structure(list(stream = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = 
c("blendziava", 
"smeltaite", "sventoji"), class = "factor"), period = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("February", "March", "April", 
"May"), class = c("ordered", "factor")), site = structure(c(1L, 
2L, 1L, 2L, 1L, 2L), .Label = c("N", "NN"), class = "factor"), 
   stake = c("A", "A", "B", "B", "C", "C"), class = c("low", 
   "medium", "low", "low", "low", "high"), height = c(0, 10, 
   0, 3.5, 0, 15), iwdo = c(13, 8.37, 10.8, 3.3, 11, 5.3), rdos = c(89.041095890411, 
   57.3287671232877, 73.972602739726, 22.6027397260274, 75.3424657534247, 
   36.3013698630137), iwc = c(359, 375, 357, 340, 360, 357), 
   dwc = c(2, 14, 4, 21, 1, 4), iwt = c(2.2, 2.1, 2.3, 2.3, 
   2.6, 2.3), dt = c(0, 0.1, 0.0999999999999996, 0.0999999999999996, 
   0.4, 0.0999999999999996), no3 = c(0.8104551, 0.6300294, 1.1296698, 
   1.2962166, 0.963123, 1.240701), nh4 = c(0.2187052, 0.1457344, 
   0.186718, 0.2177056, 0.2297008, 0.2187052), no2 = c(0.0133336, 
   0.0100408, 0.0116872, 0.0083944, 0.0127848, 0.009492)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

So far I'm using the code:

nest_data %>% 
 split(.$stream) %>% 
 purrr::map(.,function(x){
     aov(iwdo ~ period*site, data = x) %>%
         tidy(.)
 }) -> results

df <- as.data.frame(do.call(rbind,results))

that allows me to perform the test on the three stream but only on one column. I presume that I should use a for cycle but not sure where to put inside the function

Thanks in advance and hope I was clear since this is my first question here!

`


Solution

  • Consider generalizing all your steps in a defined method. Then call method iteratively which base R methods of by and sapply can help. Use reformulate to adjust formula. Please fill in each ellipsis (...).

    env_vars <- c("height", "iwdo", "rdos", ..., "no2")
    
    proc_model <- function(sub_df) {
        # NAMED LIST OF ENVIRONMENT VARS MODEL AND TESTS
        sapply(env_vars, function(env) {
            model <- aov(reformulate("period*site", env), data = sub_df)
            sp <- shapiro.test(...)
            lv <- leveneTest(...)
    
            # NAMED LIST OF MODEL AND TESTS
            list(
                aov_result = model, shapiro_test = sp, levene_test = lv
            )
        }, simplify=FALSE)
    }
    
    # NESTED NAMED LIST BY STREAM FOR EACH ENV VAR
    results_list <- by(nest_data, nest_data$stream, proc_model)
    

    To access results:

    results_list$smeltaite$height$aov_result
    results_list$smeltaite$height$shapiro_test
    results_list$smeltaite$height$levene_test
    

    For your original implementation:

    results <- nest_data %>% 
     split(.$stream) %>% 
     purrr::map(proc_model)