Search code examples
raggregatedata-management

Weighted mean using aggregate across groups in r


I want to take the weighted mean of a group using the aggregate function in r.

Here is what my data looks like:

set.seed(1980)
Group_1 <- sample(letters[1:4], 50, TRUE)
Group_2 <- sample(letters[8:13], 50, TRUE)
Weight <- sample(seq(1,50), 50, TRUE)
Value <- sample(seq(1,50), 50, TRUE)

DF <- data.frame(Group_1, Group_2, Weight, Value)

head(DF)

I want to take the weighted mean of the Value column, using the Weight column, for each pairwise group.

Such that, the aggregate function would look like:

aggregate(Value ~ Group_1 + Group_2, data = df, mean)

How would I take the weighted mean using the aggregate function?


Solution

  • Instead of mean, use weighted.mean. However, aggregate, may not be an option here because aggregate loop over only the 'Value' column and it doesn't have access to the 'Weight' for each group

    library(dplyr)
    DF %>%
         group_by(Group_1, Group_2) %>%
         summarise(wt_mean = weighted.mean(Value, Weight), .groups = 'drop')
    

    -output

    # A tibble: 21 x 3
    # Groups:   Group_1 [4]
    #   Group_1 Group_2 wt_mean
    #   <chr>   <chr>     <dbl>
    # 1 a       h         24.7 
    # 2 a       i         15   
    # 3 a       j         21.1 
    # 4 a       k         23.6 
    # 5 a       m         14.1 
    # 6 b       i         40   
    # 7 b       j         12.7 
    # 8 b       k          6.88
    # 9 b       l         30.6 
    10 b       m          5   
    # … with 11 more rows
    

    If we want to use base R, then by should work

    by(DF, DF[c('Group_1', 'Group_2')], function(x) weighted.mean(x$Value, x$Weight))