Search code examples
rdplyrtidyversemutateacross

How to use dplyr (1.1.0) across() with list of functions using shared arguments


Quick set up with dummy data and functions

Load dplyr
library(dplyr)

Set up simple data frame with one column
data <- data.frame(a = 1:5)

Define two functions
newfun1 <- function(x, val) {x + val}
newfun2 <- function(x, val) {x * val}

Store functions as named list
usefuns <- stats::setNames(as.list(c(newfun1, newfun2)), c("fun1", "fun2"))

named list of functions

Goal: Apply each function in usefuns to data column a, specifying the val argument should be 100

Using dplyr < 1.1.0, I can make it work easily:

data %>% mutate(across(.col = a, .fns = usefuns, val = 100))

results of previous code; data frame with three columns

However, using dplyr 1.1.0, I get this warning:

The ... argument of across() is deprecated as of dplyr 1.1.0. Supply arguments directly to .fns through an anonymous function instead.
#Previously
across(a:b, mean, na.rm = TRUE)
#Now
across(a:b, ~(x) mean(x, na.rm = TRUE))

I can make it work with dplyr 1.1.0 using:

data %>% mutate(across(.col = a, .fns = list(fun1 = ~newfun1(.x, val = 100), fun2 = ~newfun2(.x, val = 100))))

or even

data %>% mutate(across(.col = a, .fns = list(fun1 = ~usefuns$fun1(.x, val = 100), fun2 = ~usefuns$fun2(.x, val = 100))))

results of previous code; data frame with three columns

but I know there must be a simpler way. In the real-world scenario that I'm using this, the number of functions contained in usefuns will be variable, and there are several more arguments, but the arguments being passed to each function will always be the same.

I think I'm missing something relatively simple and have already wasted too much time experimenting. Any pointers are appreciated!


As an added note, val may have differing values each time it is used:

Set up simple data frame with three columns

data <- data.frame(a = 1:5, b = 6:10, c = 11:15)

Example of more complicated application of functions using dplyr < 1.1.0:

data %>% mutate(across(.col = c(a, b), .fns = usefuns, val = 100), across(.col = c, .fns = usefuns, val = 200))

I've tried variations on listing and naming functions, how they are stored and called on, started going down the path of using purrr but couldn't get it as close to working as I did with the code provided above... I'm wondering if the partial() function could come into play, but can't quite figure out how/if that would work.


Solution

  • You can pass your function list to purrr::map/lapply and then use purrr::partial within and pass the value of val.

    library(purrr)
    data %>% 
        mutate(across(.col = a,
                      .fns = purrr::map(usefuns, 
                                        purrr::partial, 
                                        val = 100)))
    

    Or for the more complex example:

    data <- data.frame(a = 1:5, b = 6:10, c = 11:15)
    data %>% 
        mutate(across(.col = c(a, b), 
                      .fns = map(usefuns, partial, val = 100)),
               across(.col = c,
                      .fns = map(usefuns, partial, val = 200)))
      a  b  c a_fun1 a_fun2 b_fun1 b_fun2 c_fun1 c_fun2
    1 1  6 11    101    100    106    600    211   2200
    2 2  7 12    102    200    107    700    212   2400
    3 3  8 13    103    300    108    800    213   2600
    4 4  9 14    104    400    109    900    214   2800
    5 5 10 15    105    500    110   1000    215   3000