Search code examples
rloopssumaggregate

How to group and sum using a loop in R?


I have a df in R similar to this one:

taxa <- c("bac", "bac", "bac", "bac", "bac", "bac", "arch", "arch", "arch")
ON1 <- c(2, 45, 34, 90, 0, 39, 12, 11, 5)
ON2 <- c(22, 67, 87, 90, 0, 0, 77, 21, 20)
ON3 <- c(46, 55, 1, 3, 0, 100, 88, 66, 9)
df <- data.frame(taxa, ON1, ON2, ON3, ON4)

I would like to group by "taxa" and then sum the numbers.

  • Option 1:
    s <- split(df, df$taxa)
    ON1 <- as.data.frame(lapply(s, function(x) {
    sum(x[, c("ON1")])
    }))
  • Option 2:
    ON1 <- tapply(df$ON1, df$taxa, FUN=sum)
    ON1 <- as.data.frame(ON1)

Result: Bac (210) and Arch (28)

Both Option 1 and 2 do what I want but I want to create a loop so that I can do this simultaneously for ON2 and ON3 etc. (I have many more columns)

Thanks!


Solution

  • Instead of a loop, it's easier to use tidyverse functions. To do this, you "group" by your variable and summarize with the summary function being sum.

    library(tidyverse)
    df %>%
        group_by(taxa) %>%
        summarize(across(ON1:ON3, sum))
    #> # A tibble: 2 × 4
    #>   taxa    ON1   ON2   ON3
    #>   <chr> <dbl> <dbl> <dbl>
    #> 1 arch     28   118   163
    #> 2 bac     210   266   205
    Created on 2021-09-29 by the reprex package (v2.0.1)