Search code examples
rdplyrpurrrweighted-average

pmap columns by group to get weighted row-wise mean


I'm trying to get the weighted row mean of certain columns by group.

library(tidyverse)
set.seed(1)
df <- data.frame(group = rep(LETTERS[3:4], each = 10),
           x = runif(n = 10, min = 10, max = 15),
           y = runif(n = 10, min = 100, max = 150),
           z = runif(n = 10, min = 100, max = 150))
df
#    group        x        y        z
# 1      C 11.32754 110.2987 146.7353
# 2      C 11.86062 108.8278 110.6071
# 3      C 12.86427 134.3511 132.5837
# 4      C 14.54104 119.2052 106.2778
# 5      C 11.00841 138.4921 113.3610
# 6      C 14.49195 124.8850 119.3057
# 7      C 14.72338 135.8809 100.6695
# 8      C 13.30399 149.5953 119.1194
# 9      C 13.14557 119.0018 143.4845
# 10     C 10.30893 138.8723 117.0174
# 11     D 11.32754 110.2987 146.7353
# 12     D 11.86062 108.8278 110.6071
# 13     D 12.86427 134.3511 132.5837
# 14     D 14.54104 119.2052 106.2778
# 15     D 11.00841 138.4921 113.3610
# 16     D 14.49195 124.8850 119.3057
# 17     D 14.72338 135.8809 100.6695
# 18     D 13.30399 149.5953 119.1194
# 19     D 13.14557 119.0018 143.4845
# 20     D 10.30893 138.8723 117.0174

To get crude row mean of x, y, z, I can do:

df %>% 
  mutate(rmean = pmap_dbl(list(x, y, z), ~mean(c(...))))

But I want to weight them by these weights

dfweight <- data.frame(group = c("C", "C", "C",
                                 "D", "D", "D"),
                       cat = c("x", "y", "z", 
                               "x", "y", "z"),
                       weights = c(.2, .7, .1, 
                                   .4, .1, .5))
#   group cat weights
# 1     C   x     0.2
# 2     C   y     0.7
# 3     C   z     0.1
# 4     D   x     0.4
# 5     D   y     0.1
# 6     D   z     0.5

I thought I should extract the weights first:

dfweight_split <- lapply(split(dfweight, dfweight$group), function (x) x$weights)
dfweight_split
# $C
# [1] 0.2 0.7 0.1

# $D
# [1] 0.4 0.1 0.5

But I'm then unsure how to pmap/map over these?

df %>% 
  group_by(group) %>% 
  mutate(wmean = pmap_dbl(list(x, y, z), ~weight.mean(c(..., dfweight_split))))

#OR
df %>% 
  group_by(group) %>% 
  mutate(wmean = map2(list(x, y, z), dfweight_split, ~weight.mean(.x, .y)))

Happy to see base solutions too. A similar post is here.

thanks


Solution

  • If we want to use pmap, make sure that the 'dfweight' data columns are also in the same dataset. An option is to reshape to wide with pivot_wider, then do a join (right_join) and use pmap to loop over the rows, extract the column elements on the same order with the notation .. before the index, pass those as vector arguments in weighted.mean to create the column in mutate

    library(dplyr)
    library(purrr)
    library(tidyr)
    library(stringr)
    dfweight %>% 
       pivot_wider(names_from = cat, values_from = weights) %>% 
       rename_at(-1, ~ str_c(., '_weight')) %>%
       right_join(df) %>% 
       mutate(wmean = pmap_dbl(select(., -group), 
         ~ weighted.mean(c(..4, ..5, ..6), c(..1, ..2, ..3)))) %>% 
       select(-ends_with('weight'))
    # A tibble: 20 x 5
    #   group     x     y     z wmean
    #   <chr> <dbl> <dbl> <dbl> <dbl>
    # 1 C      11.3  110.  147.  94.1
    # 2 C      11.9  109.  111.  89.6
    # 3 C      12.9  134.  133. 110. 
    # 4 C      14.5  119.  106.  97.0
    # 5 C      11.0  138.  113. 110. 
    # 6 C      14.5  125.  119. 102. 
    # 7 C      14.7  136.  101. 108. 
    # 8 C      13.3  150.  119. 119. 
    # 9 C      13.1  119.  143. 100. 
    #10 C      10.3  139.  117. 111. 
    #11 D      11.3  110.  147.  88.9
    #12 D      11.9  109.  111.  70.9
    #13 D      12.9  134.  133.  84.9
    #14 D      14.5  119.  106.  70.9
    #15 D      11.0  138.  113.  74.9
    #16 D      14.5  125.  119.  77.9
    #17 D      14.7  136.  101.  69.8
    #18 D      13.3  150.  119.  79.8
    #19 D      13.1  119.  143.  88.9
    #20 D      10.3  139.  117.  76.5