Search code examples
rdplyracross

Calling a data variable within a custom ifelse() in R


I am trying to write a specialized ifelse() function that I want to pass to dplyr::mutate(across()). The function should replace NA values in columns specified in across() with those in similarly-named columns.

For instance in the following made-up data, I want to replace missing x_var1 with y_var1 and missing x_var2 with y_var2:

x <- tribble(~x_var1, ~x_var2, ~y_var1, ~y_var2,
             5, 2, 0, 0,
             NA, 10, 8, 0,
             3, NA, 0, 5,
             NA, NA, 7, 9)   

I have tried constructing the following function:

ifelse_spec <- function(var) {
  new_var = paste("y_", str_remove(cur_column(), "x_"), sep = "")
 
  # print(new_var) # just to check new_var is correct 

  ifelse(is.na(var), !!sym(new_var) , var)  # how to call new_var?
}

x %>%
  mutate(across(c(x_var1, x_var2),
                ~ ifelse_spec(.)))

but it doesn't seem to work.

However, if I run this one-variable case using ifelse directly, I get the expected result.

x %>% 
  mutate(across(c(x_var1),
                ~ifelse(is.na(.), !!sym("y_var1"), .)))

How can I construct a custom ifelse statement that will allow me to call a data variable?

Edit: I got the following to work for the many-variable case, but still using ifelse and not a different function.

x %>% 
  mutate(across(c(x_var1, x_var2),
                ~ifelse(is.na(.), eval(sym(paste("y_", str_remove(cur_column(), "x_"), sep = ""))), . )))

Solution

  • coalesce() is designed for this problem (filling missing values from other columns). You can simplify your one-variable case by using it instead of ifelse:

    library(dplyr, warn.conflicts = FALSE)
    library(stringr)
    library(purrr)
    
    x <- tribble(~x_var1, ~x_var2, ~y_var1, ~y_var2,
                 5, 2, 0, 0,
                 NA, 10, 8, 0,
                 3, NA, 0, 5,
                 NA, NA, 7, 9)
    
    x %>% 
      mutate(x_var1 = coalesce(x_var1, y_var1))
    #> # A tibble: 4 x 4
    #>   x_var1 x_var2 y_var1 y_var2
    #>    <dbl>  <dbl>  <dbl>  <dbl>
    #> 1      5      2      0      0
    #> 2      8     10      8      0
    #> 3      3     NA      0      5
    #> 4      7     NA      7      9
    

    You can then use select() to generalise this to coalesce across similarly-named columns:

    x %>% 
      mutate(x_var1 = do.call(coalesce, select(., ends_with("var1"))))
    #> # A tibble: 4 x 4
    #>   x_var1 x_var2 y_var1 y_var2
    #>    <dbl>  <dbl>  <dbl>  <dbl>
    #> 1      5      2      0      0
    #> 2      8     10      8      0
    #> 3      3     NA      0      5
    #> 4      7     NA      7      9
    

    Finally, use map_dfc to apply this function to each column, using pattern matching to extract the "column group" it belongs to:

    x %>% 
      colnames() %>% 
      str_extract("var[0-9]") %>% 
      set_names(colnames(x)) %>% 
      map_dfc(~do.call(coalesce, select(x, ends_with(.))))
    #> # A tibble: 4 x 4
    #>   x_var1 x_var2 y_var1 y_var2
    #>    <dbl>  <dbl>  <dbl>  <dbl>
    #> 1      5      2      5      2
    #> 2      8     10      8     10
    #> 3      3      5      3      5
    #> 4      7      9      7      9
    

    You will need to adapt str_extract() and ends_with() to fit the column names in your real data, but I think this should generalise to any reasonable naming scheme. If it's important to apply a custom function to your real data instead of coalesce(), it should also be possible to rewrite map_dfc() to use it.