Search code examples
rdplyrtidyverseacross

Can you use dplyr across() to iterate across pairs of columns?


I have 18 pairs of variable and I would like to do pair-wise math on them to calculate 18 new variables. The across() function in dplyr is quite handy when applying a formula to one column. Is there a way to apply across() to pairs of columns?

Tiny example with simple division of 2 variables (my actual code will be more complex, some ifelse, ...):

library(tidyverse)
library(glue)

# filler data
df <- data.frame("label" = c('a','b','c','d'),
                 "A" = c(4, 3, 8, 9),
                 "B" = c(10, 0, 4, 1),
                 "error_A" = c(0.4, 0.3, 0.2, 0.1),
                 "error_B" = c(0.3, 0, 0.4, 0.1))

# what I want to have in the end 
# instead of just 2 (A, B), I have 18
df1 <- df %>% mutate(
  'R_A' = A/error_A,
  'R_B' = B/error_B
)

# what I'm thinking about doing to use both variables A and error_A to calculate the new column
df2 <- df %>% mutate(
  across(c('A','B'),
         ~.x/{HOW DO I USE THE COLUMN WHOSE NAME IS glue('error_',.x)}
         .names = 'R_{.col}'
)

Solution

  • One option is map/reduce. Specify the columns of interest ('nm1'), loop over them in map, select those columns from the dataset, reduce by dividing, rename the columns after column binding (_dfc), and bind those with the original dataset

    library(dplyr)
    library(purrr)
    library(stringr)
    nm1 <- c('A', 'B')
    map_dfc(nm1, ~ df %>% 
                    select(ends_with(.x)) %>% 
                    reduce(., `/`) ) %>%
        rename_all(~ str_c('R_', nm1)) %>%
        bind_cols(df, .)
    

    -output

    #  label A  B error_A error_B R_A      R_B
    #1     a 4 10     0.4     0.3  10 33.33333
    #2     b 3  0     0.3     0.0  10      NaN
    #3     c 8  4     0.2     0.4  40 10.00000
    #4     d 9  1     0.1     0.1  90 10.00000
    

    Or another option with across

    df %>% 
        mutate(across(c(A, B), ~ 
         ./get(str_c('error_', cur_column() )), .names = 'R_{.col}' ))
    #  label A  B error_A error_B R_A      R_B
    #1     a 4 10     0.4     0.3  10 33.33333
    #2     b 3  0     0.3     0.0  10      NaN
    #3     c 8  4     0.2     0.4  40 10.00000
    #4     d 9  1     0.1     0.1  90 10.00000