I want see with a two-way ANOVA for each of the 10 environmental variables ( height
, iwdo
, rdos
, etc.. until no2
) differences among period
and site
.
This, in three different indipendent watersheds grouped in stream
.
For each stream
I need to check the normality with shapiro.test
and the homoscedasticity with leveneTest
. After I run the model aov(nest_database[nest_database=="stream name (i.e. smeltaite)",]environmental variable (i.e.iwdo)~period*site)
.
So, is there a formula that can automatize such process for the three stream
and at the same time being reproduced on each column of environmental variables giving me a summary for shapiro.test
, leveneTest
and aov
results respectively?
down below the head
of my dataset
nest_data<-structure(list(stream = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label =
c("blendziava",
"smeltaite", "sventoji"), class = "factor"), period = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("February", "March", "April",
"May"), class = c("ordered", "factor")), site = structure(c(1L,
2L, 1L, 2L, 1L, 2L), .Label = c("N", "NN"), class = "factor"),
stake = c("A", "A", "B", "B", "C", "C"), class = c("low",
"medium", "low", "low", "low", "high"), height = c(0, 10,
0, 3.5, 0, 15), iwdo = c(13, 8.37, 10.8, 3.3, 11, 5.3), rdos = c(89.041095890411,
57.3287671232877, 73.972602739726, 22.6027397260274, 75.3424657534247,
36.3013698630137), iwc = c(359, 375, 357, 340, 360, 357),
dwc = c(2, 14, 4, 21, 1, 4), iwt = c(2.2, 2.1, 2.3, 2.3,
2.6, 2.3), dt = c(0, 0.1, 0.0999999999999996, 0.0999999999999996,
0.4, 0.0999999999999996), no3 = c(0.8104551, 0.6300294, 1.1296698,
1.2962166, 0.963123, 1.240701), nh4 = c(0.2187052, 0.1457344,
0.186718, 0.2177056, 0.2297008, 0.2187052), no2 = c(0.0133336,
0.0100408, 0.0116872, 0.0083944, 0.0127848, 0.009492)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
So far I'm using the code:
nest_data %>%
split(.$stream) %>%
purrr::map(.,function(x){
aov(iwdo ~ period*site, data = x) %>%
tidy(.)
}) -> results
df <- as.data.frame(do.call(rbind,results))
that allows me to perform the test on the three stream
but only on one column.
I presume that I should use a for
cycle but not sure where to put inside the function
Thanks in advance and hope I was clear since this is my first question here!
`
Consider generalizing all your steps in a defined method. Then call method iteratively which base R methods of by
and sapply
can help. Use reformulate
to adjust formula. Please fill in each ellipsis (...
).
env_vars <- c("height", "iwdo", "rdos", ..., "no2")
proc_model <- function(sub_df) {
# NAMED LIST OF ENVIRONMENT VARS MODEL AND TESTS
sapply(env_vars, function(env) {
model <- aov(reformulate("period*site", env), data = sub_df)
sp <- shapiro.test(...)
lv <- leveneTest(...)
# NAMED LIST OF MODEL AND TESTS
list(
aov_result = model, shapiro_test = sp, levene_test = lv
)
}, simplify=FALSE)
}
# NESTED NAMED LIST BY STREAM FOR EACH ENV VAR
results_list <- by(nest_data, nest_data$stream, proc_model)
To access results:
results_list$smeltaite$height$aov_result
results_list$smeltaite$height$shapiro_test
results_list$smeltaite$height$levene_test
For your original implementation:
results <- nest_data %>%
split(.$stream) %>%
purrr::map(proc_model)