Search code examples
rdplyrtidyselect

Apply dplyr::starts_with() with lambda function


I have below implementation

library(dplyr)
library(tidyr)
dat = data.frame('A' = 1:3, 'C_1' = 1:3, 'C_2' = 1:3, 'M' = 1:3)

Below works

dat %>% rowwise %>% mutate(Anew = list({function(x) c(x[1]^2, x[2] + 5, x[3] + 1)}(c(M, C_1, C_2)))) %>% ungroup %>% unnest_wider(Anew, names_sep = "")

However below does not work when I try find the column names using dplyr::starts_with()

dat %>% rowwise %>% mutate(Anew = list({function(x) c(x[1]^2, x[2] + 5, x[3] + 1)}(c(M, starts_with('C_'))))) %>% ungroup %>% unnest_wider(Anew, names_sep = "")

Any pointer on how to correctly apply starts_with() in this context will be very helpful.

PS : This is continuation from my earlier post Apply custom function that returns multiple values after dplyr::rowwise()


Solution

  • If we wrap the starts_with in c_across and assuming there is a third column that starts with C_, then the lambda function on the fly would work

    library(dplyr)
    library(tidyr)
    dat %>%
      rowwise %>%
       mutate(Anew = list((function(x) c(x[1]^2, x[2] + 5, x[3] + 
          1))(c_across(starts_with("C_"))))) %>%
      unnest_wider(Anew, names_sep = "")
    

    -output

    # A tibble: 3 × 8
          A   C_1   C_2   C_3     M Anew1 Anew2 Anew3
      <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
    1     1     1     1     1     1     1     6     2
    2     2     2     2     2     2     4     7     3
    3     3     3     3     3     3     9     8     4
    

    Or instead of doing rowwise, we may create a named list of functions and apply column wise with across (would be more efficient)

    fns <- list(C_1 = function(x) x^2, C_2 = function(x) x + 5, 
          C_3 = function(x) x + 1)
    dat %>%
       mutate(across(starts_with("C_"), 
        ~ fns[[cur_column()]](.x), .names = "Anew{seq_along(.fn)}"))
    

    -output

       A C_1 C_2 C_3 M Anew1 Anew2 Anew3
    1 1   1   1   1 1     1     6     2
    2 2   2   2   2 2     4     7     3
    3 3   3   3   3 3     9     8     4
    

    data

    dat <- data.frame('A' = 1:3, 'C_1' = 1:3, 'C_2' = 1:3, C_3 = 1:3, 'M' = 1:3)