Search code examples
rdplyrrecode

Conditional recoding of factor


I need to recode a couple of factor variables, but I just keep on failing.

Suppose my data looks like this:

df <- data.frame(a  = c("1","2","","Other"),
                 b  = c("3","","Other","Other"),
                 v1 = c("no","no","yes","yes"),
                 v2 = c("no","yes","no","no"),
                 v3 = c("no","yes","yes","no"))
df$a <- as.character(df$a)
df$b <- as.character(df$b)

df

>       a     b  v1  v2  v3
> 1     1     3  no  no  no
> 2     2        no yes yes
> 3       Other yes  no yes
> 4 Other Other yes  no  no

I want

v1 to be "yes" if (a=="1" | b=="1"),

v2 to be "yes" if (a=="2" | b=="2") and

v3 to be "yes" if (a=="3" | b=="3").

So the pattern is:

v# to be "yes" if (a="#" | b="#").

I tried with R base using 2 loops, but it did not work:

 for(i in c("a","b")){
   for(j in as.character(1:3)){
   df[which(df[,i]==j),][,c(paste("v",j,sep=""))] <- "yes"
   }}

I would prefer to do this using dplyr::mutate, but don't know how...


Solution

  • library(data.table)
    dt = as.data.table(df) # or convert in-place using setDT
    
    for (i in 1:3) dt[a == i | b == i, paste0('v', i) := 'yes']
    #       a     b  v1  v2  v3
    #1:     1     3 yes  no yes
    #2:     2        no yes yes
    #3:       Other yes  no yes
    #4: Other Other yes  no  no