Search code examples
rdplyracross

How to select columns using cur_column and apply function?


I am trying to add columns using rowSums because it is a fast function, using cur_column because the select columns depend on the initial columns a_value b_value. This is a reprex, so it needs to use rowSums cur_column or something similar because it will be used on thousands of rows and ~ 16 columns.

input:

library(tidyverse)

df <- tibble(
  a_value = c(1, 2, 3),
  b_value = c(4, 5, 6),
  a_param1 = c(7, 8, 9),
  b_param1 = c(10, 11, 12),
  a_param2 = c(13, 14, 15),
  b_param2 = c(16, 17, 18),
  a_param3 = c(19, 20, 21),
  b_param3 = c(22, 23, 24),
  a_param4 = c(25, 26, 27),
  b_param4 = c(28, 29, 30)
)

result of input.

# A tibble: 3 x 10
  a_value b_value a_param1 b_param1 a_param2 b_param2 a_param3 b_param3 a_param4 b_param4
    <dbl>   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1       1       4        7       10       13       16       19       22       25       28
2       2       5        8       11       14       17       20       23       26       29
3       3       6        9       12       15       18       21       24       27       30

desired output. The a|bvalue_value_exc_parami i+1 is the sum of a|b_parami and a|b_parami+1

# A tibble: 3 x 14
  a_value b_value a_param1 b_param1 a_param2 b_param2 a_param3 b_param3 a_param4 b_param4 a_value_exc_param_12 b_value_exc_param_12 a_value_exc_param_34 b_value_exc_param_34
    <dbl>   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>                <dbl>                <dbl>                <dbl>                <dbl>
1       1       4        7       10       13       16       19       22       25       28                   20                   26                   44                   50
2       2       5        8       11       14       17       20       23       26       29                   22                   28                   46                   52
3       3       6        9       12       15       18       21       24       27       30                   24                   30                   48                   54

What I tried so far, it does not work:


df %>% 
  mutate(
    across(
      contains("_value"),
      list(
      param_12 = rowSums(~ select(.,
                     get(str_replace(cur_column(), "_value", "_param1")),
                     get(str_replace(cur_column(), "_value", "_param2"))
                     )
           ),
      param_34 = rowSums(~ select(.,
                       get(str_replace(cur_column(), "_value", "_param3")),
                       get(str_replace(cur_column(), "_value", "_param4"))
      )
      
      ),
      .names = "{.col}_exc_{.fn}"
           
  )
  )
  )

There might be another easy and efficient approach I'm not aware of, feel free to provide, thanks.


Solution

  • You can use the following :

    library(dplyr)
    library(stringr)
    
    df %>% 
      transmute(across(contains("_value"),
                    list(param_12 = ~rowSums(select(cur_data(),
                                str_replace(cur_column(), "_value", "_param1"),
                                str_replace(cur_column(), "_value", "_param2"))), 
                         param_34 = ~rowSums(select(cur_data(),
                                str_replace(cur_column(), "_value", "_param3"),
                                str_replace(cur_column(), "_value", "_param4")))), 
                    .names = "{.col}_exc_{.fn}"))
    
    #  a_value_exc_param_12 a_value_exc_param_34 b_value_exc_param_12 b_value_exc_param_34
    #                 <dbl>                <dbl>                <dbl>                <dbl>
    #1                   20                   44                   26                   50
    #2                   22                   46                   28                   52
    #3                   24                   48                   30                   54
    

    I have used transmute instead of mutate here to show the output. In your real case, you can of course use mutate to append these new columns.


    We can write a function to avoid repetition of code.

    apply_sum <- function(data, col, val, replace) {
      rowSums(select(data, str_replace(col, val, replace)))
    }
    
    df %>% 
      transmute(across(contains("_value"),
                       list(
                   param_12 = ~apply_sum(cur_data(), cur_column(), 'value', c('param1', 'param2')),
                   param_34 = ~apply_sum(cur_data(), cur_column(), 'value', c('param3', 'param4'))),
                       .names = "{.col}_exc_{.fn}"))