I am using R dplyr and trying to mutate multiple columns in place. These columns are var1-var3, and they currently contain values of either 1 or NA. I want to apply the following logic: if a given row in var1-var3 has at least one value = 1, then any NA values for that row will be transformed to zero. If a row has NA across all columns (var1-var3), then the values will remain NA. I am trying this:
df <- data.frame("var1" = c(1,NA,1,NA,NA),
"var2" = c(NA,NA,NA,1,1),
"var3" = c(1,NA,NA,1,NA),
"age" = c(25,41,39,60,36) ,
"satisfaction" = c(5,3,2,5,4)
)
# Output
# var1 var2 var3 age sat
# 1 1 NA 1 25 5
# 2 NA NA NA 41 3
# 3 1 NA NA 39 2
# 4 NA 1 1 60 5
# 5 NA 1 NA 36 4
df <- df %>%
mutate_at(vars(contains('var')), ~ case_when(if_any(.x, `==`, 1),
ifelse(is.na(.x), 0, .x ), .x) #replace NA with 0
)
One solution which first checks if all the values across your chosen columns are NA, and then applies case_when()
in combination with replace_na()
from {tidyr}:
library(dplyr)
df <- df |>
mutate(all_na = rowSums(!is.na(across(contains('var'))))) |>
mutate(across(contains('var'), ~case_when(
all_na != 0 ~ tidyr::replace_na(.x, 0),
all_na == 0 ~ .x
))) |>
select(-all_na)
returns:
var1 var2 var3 age satisfaction
1 1 0 1 25 5
2 NA NA NA 41 3
3 1 0 0 39 2
4 0 1 1 60 5
5 0 1 0 36 4