Search code examples
rtidyverseforcats

Replace factor value by NA if condition


I want to replace values in a factor variable depending on another column, while not changing the initial factor levels.

Example:

x <- structure(list(Payee = structure(c(NA, 1L, 2L),
 .Label = c("0", "x"), class = "factor"), PayeeID_Hash = structure(c(NA, 1L,2L), 
.Label = c("0x31BCA02","0xB672841"), class = "factor")),
 row.names = c(NA,"tbl", "data.frame"))
> x
# A tibble: 3 x 2
  Payee PayeeID_Hash
  <fct> <fct>       
1 NA    NA          
2 0     0x31BCA02   
3 x     0xB672841  

When Payee is '0', then the corresponding PayeeID_Hash value should not exist (i.e. it should be NA). Please note that I do not want to drop the factor level 0x31BCA02 (it will be present in other rows where Payee has level x). Also, I want to keep the PayeeID_Hash levels as they are (I do not want to replace them with other values).

Expected output:

> x
# A tibble: 3 x 2
  Payee PayeeID_Hash
  <fct> <fct>       
1 NA    NA          
2 0     NA          
3 x     0xB672841  

I could do this by transforming factor to character and then back to factor as:

x %>%
  mutate(PayeeID_Hash = as.character(PayeeID_Hash),
         PayeeID_Hash = ifelse(Payee == "0", NA_character_, PayeeID_Hash),
         PayeeID_Hash = as.factor(PayeeID_Hash))

Is there another cleaner (i.e. more straight forward) way to do this?


Solution

  • We can use replace and avoid the step 2 and 4. It would keep the factor column as such and doesn't coerce factor to integer (unless converted to character class) as in ifelse

    library(dplyr)
    x %>%
       mutate(PayeeID_Hash = droplevels(replace(PayeeID_Hash, Payee == "0", NA)))
    # A tibble: 3 x 2
    #  Payee PayeeID_Hash
    #  <fct> <fct>       
    #1 <NA>  <NA>        
    #2 0     <NA>        
    #3 x     0xB672841