Search code examples
rdplyr

Calculate mean of group mean in R dplyr


Consider this dataframe:

library(dplyr)

df <- data.frame(id = c(1,1,1,2,2), x = 1:5)

  id x
1  1 1
2  1 2
3  1 3
4  2 4
5  2 5

To get average x values per id, i use

df |> group_by(id) |> dplyr::summarise(group_mean = mean(x))

# A tibble: 2 × 2
     id group_mean
  <dbl>      <dbl>
1     1        2  
2     2        4.5

I need to calculate the average of these group means, which is (2 + 4.5) / 2 = 3.25. However, this code fails:

df |> group_by(id) |> dplyr::summarise(group_mean = mean(x)) |> mean(group_mean)

[1] NA
Warning message:
In mean.default(dplyr::summarise(group_by(df, id), group_mean = mean(x)),  :
  argument is not numeric or logical: returning NA

Any suggestions?

EDIT: This question is not similar to enter link description here as mentioned by @shizzle because i'm looking for the unbalanced mean of means, i.e., a second stage of aggregation and not for the first stage of calculating averages.


Solution

  • You could just pull the column with values and calculate the mean after like this:

    library(dplyr)
    
    df |> 
      group_by(id) |> 
      dplyr::summarise(group_mean = mean(x)) |> 
      pull(group_mean) |>
      mean()
    #> [1] 3.25
    

    Created on 2024-07-31 with reprex v2.1.0