Search code examples
rfunctiondplyrtidyselect

Pass column names to dplyr::coalesce() when writing a custom function


I'm trying to write a function that will wrap dplyr::coalesce(), and will take in a data object and column names to coalesce. So far, my attempts have failed.

Example data

library(dplyr)

df <-
  data.frame(col_a = c("bob", NA, "bob", NA, "bob"), 
                 col_b = c(NA, "danny", NA, NA, NA), 
                 col_c = c("paul", NA, NA, "paul", NA))

##   col_a col_b col_c
## 1   bob  <NA>  paul
## 2  <NA> danny  <NA>
## 3   bob  <NA>  <NA>
## 4  <NA>  <NA>  paul
## 5   bob  <NA>  <NA>

Taking a stub at writing a custom function

coalesce_plus_1 <- function(data, vars) {

  data %>%
    mutate(coalesced_col = coalesce(!!! rlang::syms(tidyselect::vars_select(names(.), vars))))

}
coalesce_plus_2 <- function(data, vars) {
  
  data %>%
    mutate(coalesced_col = coalesce(!!! rlang::syms(vars)))
  
}
coalesce_plus_3 <- function(data, vars) {
  
  data %>%
    mutate(coalesced_col = coalesce({{ vars }}))
  
}

The results...

coalesce_plus_1()

df %>%
  coalesce_plus_1(data = ., vars = c(col_a, col_b, col_c))

Error: object 'col_a' not found.

However:

df %>%
  coalesce_plus_1(data = ., vars = all_of(starts_with("col")))

##   col_a col_b col_c coalesced_col
## 1  <NA>  <NA>  paul          paul
## 2  <NA> danny  <NA>         danny
## 3   bob  <NA>  <NA>           bob
## 4  <NA>  <NA>  paul          paul
## 5   bob  <NA>  <NA>           bob


coalesce_plus_2()

df %>%
  coalesce_plus_2(data = ., vars = c(col_a, col_b, col_c))

Error in lapply(.x, .f, ...) : object 'col_a' not found

And also

df %>%
  coalesce_plus_2(data = ., vars = all_of(starts_with("col")))

Error: starts_with() must be used within a selecting function. i See https://tidyselect.r-lib.org/reference/faq-selection-context.html. Run rlang::last_error() to see where the error occurred.



coalesce_plus_3()

df %>%
  coalesce_plus_3(data = ., vars = c(col_a, col_b, col_c))

Error: Problem with mutate() input coalesced_col. x Input coalesced_col can't be recycled to size 5. i Input coalesced_col is coalesce(c(col_a, col_b, col_c)). i Input coalesced_col must be size 5 or 1, not 15.

And also

df %>%
  coalesce_plus_3(data = ., vars = all_of(starts_with("col")))

Error: Problem with mutate() input coalesced_col. x starts_with() must be used within a selecting function. i See https://tidyselect.r-lib.org/reference/faq-selection-context.html. i Input coalesced_col is coalesce(all_of(starts_with("col"))).

Bottom line

How can I write a custom function for coalesce() that will take in a data object and specific column names to coalesce, allowing both specific naming e.g., c(col_a, col_b, col_c) and helper functions e.g., starts_with("col") in the function's vars argument?


Solution

  • This a simple implementation that will only return the select columns, but could fairly easily extended to keep all columns (I'd bind_cols them back on at the end...).

    It's simple because we rely on select to do the work for us, as suggested at the start of the Implementing tidyselect vignette

    # edited to keep all columns
    coalesce_df = function(data, ...) {
      data %>%
        select(...) %>%
        transmute(result = invoke(coalesce, .)) %>%
        bind_cols(data, .)
    }
    
    
    
    df %>%
       coalesce_df(everything())
    #   col_a col_b col_c result
    # 1   bob  <NA>  paul    bob
    # 2  <NA> danny  <NA>  danny
    # 3   bob  <NA>  <NA>    bob
    # 4  <NA>  <NA>  paul   paul
    # 5   bob  <NA>  <NA>    bob
    
    df %>% coalesce_df(col_a, col_b)
    #   col_a col_b col_c result
    # 1   bob  <NA>  paul    bob
    # 2  <NA> danny  <NA>  danny
    # 3   bob  <NA>  <NA>    bob
    # 4  <NA>  <NA>  paul   <NA>
    # 5   bob  <NA>  <NA>    bob