Search code examples
rdplyr

How to "sumproduct" based on separate dataframe


I have the following data with 5 numerical steps:

    test_data <- tribble(
      ~one, ~two, ~three, ~four, ~five, 
      1, 2, 3, 4, 5, 
      2, 4, 6, 8, 10,
      3, 6, 9, 12, 15, 
      4, 8, 12, 16, 20
    )

And I would like to multiply the following dataframe:

    multiplier <- tribble(
      ~step, ~pct,
      "one", 0,
      "two", 1, 
      "three", 0, 
      "four", 1, 
      "five", 0
    )

Such that I get the "result" column in the following: (i.e. only certain steps out of the five are added, based on their definition in multiplier)

    > result_df
    # A tibble: 4 x 6
        one   two three  four  five result
      <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
    1     1     2     3     4     5      6
    2     2     4     6     8    10     12
    3     3     6     9    12    15     18
    4     4     8    12    16    20     24

This is a simple example with 5 steps and a 100% multiplier, so it is easy to select and add the specific columns using rowSums. However, the issue is that I am working with a real world data set with c. 40 steps and some percentages that are 0% < x < 100%. As such, I am in need of a "sum product" style function.

I have tried the following:

test_data %>% 
  rowwise() %>% 
  mutate(result = prod(one:five, multiplier$pct))

and get the wrong answer. Additionally, I have tried crossprod however I understand this is akin to matrix multiplication.

A dplyr style solution will really be appreciated. Thanks.


Solution

  • Use matrix multiplication:

    library(dplyr)
    test_data %>%
      mutate(result = c(as.matrix(.) %*% multiplier$pct))
    

    giving

    # A tibble: 4 × 6
        one   two three  four  five result
      <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
    1     1     2     3     4     5      6
    2     2     4     6     8    10     12
    3     3     6     9    12    15     18
    4     4     8    12    16    20     24
    

    This also works

    test_data %>% mutate(result = colSums(t(.) * multiplier$pct))