Search code examples
rdplyrmultiple-columnssummarizeacross

how to apply a function(x,y) with two variables across set of variables ending with .x and .y using dplyr


Sample data:

sampdat <- data.frame(grp=rep(c("a","b","c"),c(2,3,5)), x1=seq(0,.9,0.1),x2=seq(.3,.75,0.05), y1=c(1:10), y2=c(11:20))

I would like to have the following data, but i have 100+ variables for which i'd like to apply a function with two variables:

myfun <- function(x,y) {
  z=x*y
}
needdat <- sampdat %>% mutate(z1=x1*y1, z2=x2*y2)

What is the most efficient approach to doing this using dplyr's across and summarise?

Thanks in advance for your suggestions/solutions!

Best, SaM


Solution

  • Easier would be to use two across

    library(dplyr)
    library(stringr)
    sampdat %>% 
       mutate(across(starts_with('x'),
       .names = "{str_replace(.col, 'x', 'z')}") * 
            across(starts_with('y')))
    

    -output

       grp  x1   x2 y1 y2  z1   z2
    1    a 0.0 0.30  1 11 0.0  3.3
    2    a 0.1 0.35  2 12 0.2  4.2
    3    b 0.2 0.40  3 13 0.6  5.2
    4    b 0.3 0.45  4 14 1.2  6.3
    5    b 0.4 0.50  5 15 2.0  7.5
    6    c 0.5 0.55  6 16 3.0  8.8
    7    c 0.6 0.60  7 17 4.2 10.2
    8    c 0.7 0.65  8 18 5.6 11.7
    9    c 0.8 0.70  9 19 7.2 13.3
    10   c 0.9 0.75 10 20 9.0 15.0
    

    Or with dplyover

    library(dplyover)
    sampdat %>% 
      mutate(across2(starts_with('x'), starts_with('y'),
       ~ .x * .y, .names = "z{xcol}"))