Search code examples
rloopsfor-looplinear-regressionanova

Is there a way to loop through column names (not numbers) in r for linear models?


I have a data sheet with 40 data columns (40 different nutrients), with additional columns for plot numbers and factors. I would like to automatically loop through each column name and produce a linear model and summary for each. The data columns begin at column 10.

for(i in 10:ncol(df)) {       # for-loop over columns
  mod2<-aov(i~block+tillage*residue+Error(subblock),data=df)
  summary(mod2)
}

This is currently producing the error Error in model.frame.default(formula = i ~ subblock, data = df, drop.unused.levels = TRUE) : variable lengths differ (found for 'subblock') Variable lengths are consistent so I imagine I am looping incorrectly.

The data looks similar to below (with more categorical columns at the start), with the nutrient columns beginning at column 10.

block tillage residue subblock nutrient 1 nutrient 2 etc.
b1 NT NR s1 0.5 0.6

Solution

  • If you want the statistics in a table (which might come in handy) you can use the purrr and broom packages. Here's an example using the dataset mtcars:

    Code

    library(tidyr)
    library(purrr)
    library(broom)
    
    formula <- lapply(colnames(mtcars)[3:ncol(mtcars)], function(x) as.formula(paste0(x, " ~ cyl")))
    
    names(formula) <- format(formula)
    
    table <- formula %>% map(~aov(.x, mtcars)) %>% map_dfr(tidy, .id="model")
    

    Output

    > head(table)
    # A tibble: 6 x 7
      model      term         df     sumsq     meansq statistic   p.value
      <chr>      <chr>     <dbl>     <dbl>      <dbl>     <dbl>     <dbl>
    1 disp ~ cyl cyl           1 387454.   387454.        131.   1.80e-12
    2 disp ~ cyl Residuals    30  88731.     2958.         NA   NA       
    3 hp ~ cyl   cyl           1 100984.   100984.         67.7  3.48e- 9
    4 hp ~ cyl   Residuals    30  44743.     1491.         NA   NA       
    5 drat ~ cyl cyl           1      4.34      4.34       28.8  8.24e- 6
    6 drat ~ cyl Residuals    30      4.52      0.151      NA   NA    
    

    Try

    formula <- lapply(colnames(df)[10:ncol(df)], function(x) as.formula(paste0(x, " ~ block + tillage * residue + Error(subblock)")))
    
    names(formula) <- format(formula)
    
    table <- formula %>% map(~aov(.x, df)) %>% map_dfr(tidy, .id="model")