Search code examples
rdplyr

How to group_by() by two variables in a dplyr call?


I am getting the mean by week (wk), however, wk 21 doesn't have data all so I would like to get the mean by 'month' instead. How do I switch group_by(wk) to group_by(month) in the same dplyr call? The final result for wk 21 should be 5.8.

library(dplyr)
fish <- structure(list(wk = c(20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 
21, 21, 21, 21, 22, 22, 22, 22, 22, 22, 22), month = c(5, 5, 
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6), pd = c(6, 
4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 10, 
4, 5, NA, 6)), row.names = c(NA, -21L), class = "data.frame")

fish %>% group_by(wk) %>% summarise(Mean = mean(pd, na.rm=T))

# A tibble: 3 x 2
     wk   Mean
  <dbl>  <dbl>
1    20   5   
2    21 NaN   
3    22   6.25

Solution

  • You can't have 2 different groupings active, but you could do something like this:

    fish |> 
      group_by(month) |>
      mutate(month_mean = mean(pd, na.rm = TRUE)) |>
      group_by(wk) |>
      summarise(Mean = coalesce(mean(pd, na.rm=T), first(month_mean)))
    # # A tibble: 3 × 2
    #      wk  Mean
    #   <dbl> <dbl>
    # 1    20  5   
    # 2    21  5.8
    # 3    22  6.25