I have a data sheet with 40 data columns (40 different nutrients), with additional columns for plot numbers and factors. I would like to automatically loop through each column name and produce a linear model and summary for each. The data columns begin at column 10.
for(i in 10:ncol(df)) { # for-loop over columns
mod2<-aov(i~block+tillage*residue+Error(subblock),data=df)
summary(mod2)
}
This is currently producing the error Error in model.frame.default(formula = i ~ subblock, data = df, drop.unused.levels = TRUE) : variable lengths differ (found for 'subblock')
Variable lengths are consistent so I imagine I am looping incorrectly.
The data looks similar to below (with more categorical columns at the start), with the nutrient columns beginning at column 10.
block | tillage | residue | subblock | nutrient 1 | nutrient 2 | etc. |
---|---|---|---|---|---|---|
b1 | NT | NR | s1 | 0.5 | 0.6 |
If you want the statistics in a table (which might come in handy) you can use the purrr
and broom
packages. Here's an example using the dataset mtcars
:
Code
library(tidyr)
library(purrr)
library(broom)
formula <- lapply(colnames(mtcars)[3:ncol(mtcars)], function(x) as.formula(paste0(x, " ~ cyl")))
names(formula) <- format(formula)
table <- formula %>% map(~aov(.x, mtcars)) %>% map_dfr(tidy, .id="model")
Output
> head(table)
# A tibble: 6 x 7
model term df sumsq meansq statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 disp ~ cyl cyl 1 387454. 387454. 131. 1.80e-12
2 disp ~ cyl Residuals 30 88731. 2958. NA NA
3 hp ~ cyl cyl 1 100984. 100984. 67.7 3.48e- 9
4 hp ~ cyl Residuals 30 44743. 1491. NA NA
5 drat ~ cyl cyl 1 4.34 4.34 28.8 8.24e- 6
6 drat ~ cyl Residuals 30 4.52 0.151 NA NA
Try
formula <- lapply(colnames(df)[10:ncol(df)], function(x) as.formula(paste0(x, " ~ block + tillage * residue + Error(subblock)")))
names(formula) <- format(formula)
table <- formula %>% map(~aov(.x, df)) %>% map_dfr(tidy, .id="model")