Search code examples
rdplyrtidyversepurrr

Calculate the row-wise weighted sum for a set of columns


I have, say, the following data frame:

> library(tidyverse)
> dd <- tibble(a = rep(1,10), b = rep(1,10), c = rep(1,10))
> dd
# A tibble: 10 × 3
       a     b     c
   <dbl> <dbl> <dbl>
 1     1     1     1
 2     1     1     1
 3     1     1     1
 4     1     1     1
 5     1     1     1
 6     1     1     1
 7     1     1     1
 8     1     1     1
 9     1     1     1
10     1     1     1

and a vector of weights:

> weight <- c(1, 5, 10)
> weight
[1]  1  5 10

when I want to calculate the row-wise weighted sum for all the columns of the dataframe together, I do this:

> dd %>% mutate(m = rowSums(map2_dfc(dd, weight,`*`)))
# A tibble: 10 × 4
       a     b     c     m
   <dbl> <dbl> <dbl> <dbl>
 1     1     1     1    16
 2     1     1     1    16
 3     1     1     1    16
 4     1     1     1    16
 5     1     1     1    16
 6     1     1     1    16
 7     1     1     1    16
 8     1     1     1    16
 9     1     1     1    16
10     1     1     1    16

but I don't know how to calculate the row-wise weighted sum for a subset of the data frame. I tried the code below, but it gives messy results:

> dd %>% rowwise() %>% mutate(m = rowwise(map2_dfc(c_across(b:c), weight[2:3],`*`)))
New names:
• `` -> `...1`
• `` -> `...2`
New names:
• `` -> `...1`
• `` -> `...2`
New names:
• `` -> `...1`
• `` -> `...2`
New names:
• `` -> `...1`
• `` -> `...2`
New names:
• `` -> `...1`
• `` -> `...2`
New names:
• `` -> `...1`
• `` -> `...2`
New names:
• `` -> `...1`
• `` -> `...2`
New names:
• `` -> `...1`
• `` -> `...2`
New names:
• `` -> `...1`
• `` -> `...2`
New names:
• `` -> `...1`
• `` -> `...2`
# A tibble: 10 × 4
# Rowwise: 
       a     b     c m$...1 $...2
   <dbl> <dbl> <dbl>  <dbl> <dbl>
 1     1     1     1      5    10
 2     1     1     1      5    10
 3     1     1     1      5    10
 4     1     1     1      5    10
 5     1     1     1      5    10
 6     1     1     1      5    10
 7     1     1     1      5    10
 8     1     1     1      5    10
 9     1     1     1      5    10
10     1     1     1      5    10

Can someone please give me a hint as to how to approach this problem?


Solution

  • Using tidyverse methods, we can create a named vector for 'weight', loop across the columns 'b' to 'c', subset the 'weight' value based on the column name (cur_column()), multiply and get the rowSums

    library(dplyr)
    names(weight) <- names(dd)
    dd %>% 
       mutate(m = rowSums(across(b:c,  ~ .x * weight[cur_column()])))
    

    -output

    # A tibble: 10 × 4
           a     b     c     m
       <dbl> <dbl> <dbl> <dbl>
     1     1     1     1    15
     2     1     1     1    15
     3     1     1     1    15
     4     1     1     1    15
     5     1     1     1    15
     6     1     1     1    15
     7     1     1     1    15
     8     1     1     1    15
     9     1     1     1    15
    10     1     1     1    15
    

    Or if we want to use rowwise (not recommended as it is slower)

    dd %>% 
      rowwise %>%
      mutate(m = sum(c_across(b:c) * weight[2:3])) %>%
      ungroup
    

    Or use crossprod

    dd %>%
       mutate(m = crossprod(t(pick(b:c)), weight[2:3])[,1])
    

    Or with base R

    dd$m <-  rowSums(dd[2:3] * weight[2:3][col(dd[2:3])])