Search code examples
rfor-looptidyverselapply

Looping over groups in R


I have a df including a set of data frames, df1, df2, and df3 where each data frame follow this structure:

df1 <- data.frame(year = c("2013", "2013", "2013", "2013", "2013","2013"), 
                  site = c("a", "a", "a", "a", "a", "a"),
                  trt = c("x", "y", "x", "y", "x", "y"),
                  cover = c(2, 5, 1,20,50,12))

df2 <- data.frame(year = c("2014", "2014", "2014", "2014", "2014","2014"),
                  site = c("a", "a", "a", "a", "a", "a"),
                  trt = c("x", "y", "x", "y", "x", "y"),
                  cover = c(1, 3, 1,24,32,12))

df3 <- data.frame(year = c("2015", "2015", "2015", "2015", "2015","2015"),
                  site = c("a", "a", "a", "a", "a", "a"),
                  trt = c("x", "y", "z", "z", "x", "y"),
                  cover = c(2, 5, 1,2,11,32))

df <- rbind(df1, df2, df3)
df

   year site trt cover
1  2013    a   x     2
2  2013    a   y     5
3  2013    a   x     1
4  2013    a   y    20
5  2013    a   x    50
6  2013    a   y    12
7  2014    a   x     1
8  2014    a   y     3
9  2014    a   x     1
10 2014    a   y    24
11 2014    a   x    32
12 2014    a   y    12
13 2015    a   x     2
14 2015    a   y     5
15 2015    a   z     1
16 2015    a   z     2
17 2015    a   x    11
18 2015    a   y    32

I used to rank the values in the cover column for each year, using a for loop.

v1 <- unique(df$year)
lst <- list()

for (i in seq_along(v1)) {
  lst[[i]] <- df |> 
    filter(year == v1[i]) |> 
    mutate(rank = dense_rank(desc(cover)))
}

Now, I am trying to rank the values of each group (as defined in the trt column) for each year, but I am having trouble figuring out how to do so. How can I do this with for loop. I am open to get an answer with lapply function as I would like to learn about it.


Solution

  • Using dplyr, we can avoid the loop and the filtering by using group before mutate, and then construct the list using group_split.

    library(dplyr)
    
    df |>
      group_by(year) |>
      mutate(rank = dense_rank(desc(cover))) |>
      group_split()
    

    Output:

    [[1]]
    # A tibble: 6 × 5
      year  site  trt   cover  rank
      <chr> <chr> <chr> <dbl> <int>
    1 2013  a     x         2     5
    2 2013  a     y         5     4
    3 2013  a     x         1     6
    4 2013  a     y        20     2
    5 2013  a     x        50     1
    6 2013  a     y        12     3
    
    [[2]]
    # A tibble: 6 × 5
      year  site  trt   cover  rank
      <chr> <chr> <chr> <dbl> <int>
    1 2014  a     x         1     5
    2 2014  a     y         3     4
    3 2014  a     x         1     5
    4 2014  a     y        24     2
    5 2014  a     x        32     1
    6 2014  a     y        12     3
    
    [[3]]
    # A tibble: 6 × 5
      year  site  trt   cover  rank
      <chr> <chr> <chr> <dbl> <int>
    1 2015  a     x         2     4
    2 2015  a     y         5     3
    3 2015  a     z         1     5
    4 2015  a     z         2     4
    5 2015  a     x        11     2
    6 2015  a     y        32     1