I am trying to write a specialized ifelse()
function that I want to pass to dplyr::mutate(across())
. The function should replace NA values in columns specified in across()
with those in similarly-named columns.
For instance in the following made-up data, I want to replace missing x_var1
with y_var1
and missing x_var2
with y_var2
:
x <- tribble(~x_var1, ~x_var2, ~y_var1, ~y_var2,
5, 2, 0, 0,
NA, 10, 8, 0,
3, NA, 0, 5,
NA, NA, 7, 9)
I have tried constructing the following function:
ifelse_spec <- function(var) {
new_var = paste("y_", str_remove(cur_column(), "x_"), sep = "")
# print(new_var) # just to check new_var is correct
ifelse(is.na(var), !!sym(new_var) , var) # how to call new_var?
}
x %>%
mutate(across(c(x_var1, x_var2),
~ ifelse_spec(.)))
but it doesn't seem to work.
However, if I run this one-variable case using ifelse
directly, I get the expected result.
x %>%
mutate(across(c(x_var1),
~ifelse(is.na(.), !!sym("y_var1"), .)))
How can I construct a custom ifelse statement that will allow me to call a data variable?
Edit: I got the following to work for the many-variable case, but still using ifelse
and not a different function.
x %>%
mutate(across(c(x_var1, x_var2),
~ifelse(is.na(.), eval(sym(paste("y_", str_remove(cur_column(), "x_"), sep = ""))), . )))
coalesce()
is designed for this problem (filling missing values from other columns). You can simplify your one-variable case by using it instead of ifelse
:
library(dplyr, warn.conflicts = FALSE)
library(stringr)
library(purrr)
x <- tribble(~x_var1, ~x_var2, ~y_var1, ~y_var2,
5, 2, 0, 0,
NA, 10, 8, 0,
3, NA, 0, 5,
NA, NA, 7, 9)
x %>%
mutate(x_var1 = coalesce(x_var1, y_var1))
#> # A tibble: 4 x 4
#> x_var1 x_var2 y_var1 y_var2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5 2 0 0
#> 2 8 10 8 0
#> 3 3 NA 0 5
#> 4 7 NA 7 9
You can then use select()
to generalise this to coalesce across similarly-named columns:
x %>%
mutate(x_var1 = do.call(coalesce, select(., ends_with("var1"))))
#> # A tibble: 4 x 4
#> x_var1 x_var2 y_var1 y_var2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5 2 0 0
#> 2 8 10 8 0
#> 3 3 NA 0 5
#> 4 7 NA 7 9
Finally, use map_dfc
to apply this function to each column, using pattern matching to extract the "column group" it belongs to:
x %>%
colnames() %>%
str_extract("var[0-9]") %>%
set_names(colnames(x)) %>%
map_dfc(~do.call(coalesce, select(x, ends_with(.))))
#> # A tibble: 4 x 4
#> x_var1 x_var2 y_var1 y_var2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5 2 5 2
#> 2 8 10 8 10
#> 3 3 5 3 5
#> 4 7 9 7 9
You will need to adapt str_extract()
and ends_with()
to fit the column names in your real data, but I think this should generalise to any reasonable naming scheme. If it's important to apply a custom function to your real data instead of coalesce()
, it should also be possible to rewrite map_dfc()
to use it.