I'm trying to recode several dummy variables at once but am struggling to come up with a functioning vectorized solution (alternatively a for loop).
reprex:
library(tidyverse)
library(magrittr)
library(dummies)
library(janitor)
df_raw <- data.frame(
species = as.factor(c("cat", "dog", NA, "dog", "dog")),
weight = rnorm(5, mean = 5, sd = 1),
sex = as.factor(c("m", NA, "f", "f", "m"))
)
df_raw
species weight sex
1 cat 3.025896 m
2 dog 3.223064 <NA>
3 <NA> 5.230367 f
4 dog 4.231511 f
5 dog 5.819032 m
I split the factor variables (species
and sex
) into dummies but the NA
get their own indicators (species_na
and sex_na
)
df_dummy <- dummies::dummy.data.frame(df_raw,
dummy.classes = "factor",
sep = "_",
omit.constants = TRUE,
all = TRUE) %>%
janitor::clean_names()
species_cat species_dog species_na weight sex_f sex_m sex_na
1 1 0 0 3.025896 0 1 0
2 0 1 0 3.223064 0 0 1
3 0 0 1 5.230367 1 0 0
4 0 1 0 4.231511 1 0 0
5 0 1 0 5.819032 0 1 0
My problem: how do I efficiently recode all of the factor dummies ("indexed" by the prefix, e.g. species_
) to NA conditional on value of the _na
dummy in the respective group of dummies? In other words, I need to mutate all dummies with the prefix species_
as NA
whenever the species_na == 1
etc.
I have come up with the solution below, but I haven't been able to generalize the last step to the entire dataset
factor_vars <- dplyr::select_if(df_raw, is.factor) %>% colnames()
na_labs <- paste(factor_vars,
"na",
sep = "_")
df_dummy <- df_dummy %>%
dplyr::mutate(across(all_of(na_labs),
.fns = list(var = ~ . == 1),
.names = "{fn}_{col}" ))
# --- trial run for one variable only
test <- df_dummy %>%
mutate(species_cat = ifelse(var_species_na == TRUE,
NA,
species_cat))
Any help is appreciated!
You can try -
library(dplyr)
library(purrr)
df_dummy <- dummies::dummy.data.frame(df_raw,
dummy.classes = "factor",
sep = "_",
omit.constants = TRUE,
all = TRUE) %>%
janitor::clean_names()
factor_vars <- dplyr::select_if(df_raw, is.factor) %>% colnames()
na_labs <- paste(factor_vars,
"na",
sep = "_")
map_dfc(factor_vars, ~df_dummy %>%
select(contains(.x)) %>%
mutate(across(.fns = ~ifelse(.data[[paste0(.x, '_na')]] == 1, NA, .))))
# species_cat species_dog species_na sex_f sex_m sex_na
#1 1 0 0 0 1 0
#2 0 1 0 NA NA NA
#3 NA NA NA 1 0 0
#4 0 1 0 1 0 0
#5 0 1 0 0 1 0