Search code examples
rdata.tablerecode

Recode a variable using data.table


I am trying to recode a variable using data.table. I have googled for almost 2 hours but couldn't find an answer.

Assume I have a data.table as the following:

DT <- data.table(V1=c(0L,1L,2L),
                 V2=LETTERS[1:3],
                 V4=1:12)

I want to recode V1 and V2. For V1, I want to recode 1s to 0 and 2s to 1. For V2, I want to recode A to T, B to K, C to D.

If I use dplyr, it is simple.

library(dplyr)
DT %>% 
  mutate(V1 = recode(V1, `1` = 0L, `2` = 1L)) %>% 
  mutate(V2 = recode(V2, A = "T", B = "K", C = "D"))

But I have no idea how to do this in data.table

DT[V1==1, V1 := 0]
DT[V1==2, V1 := 1]
DT[V2=="A", V2 := "T"]
DT[V2=="B", V2 := "K"]
DT[V2=="C", V2 := "D"]

Above is the code that I can think as my best. But there must be a better and a more efficient way to do this.


Edit

I changed how I want to recode V2 to make my example more general.


Solution

  • I think this might be what you're looking for. On the left hand side of := we name the variables we want to update and on the right hand side we have the expressions we want to update the corresponding variables with.

    DT[, c("V1","V2") := .(as.numeric(V1==2), sapply(V2, function(x) {if(x=="A") "T" 
                                                         else if (x=="B") "K" 
                                                         else if (x=="C") "D" }))]
    
     #   V1 V2 V4
     #1:  0  T  1
     #2:  0  K  2
     #3:  1  D  3
     #4:  0  T  4
     #5:  0  K  5
     #6:  1  D  6
     #7:  0  T  7
     #8:  0  K  8
     #9:  1  D  9
    #10:  0  T 10
    #11:  0  K 11
    #12:  1  D 12
    

    Alternatively, just use recode within data.table:

    library(dplyr)
    DT[, c("V1","V2") := .(as.numeric(V1==2), recode(V2, "A" = "T", "B" = "K", "C" = "D"))]