Search code examples
rone-hot-encodingdummy-variable

R multi-hot encoding among multiple columns


My data is in the shape

Event Id Var1 Var2 Var3
1      a   x    w    y
2      a   z    y    w
3      b   x    y    q

and I need to create multi-hot encoded vectors for each row in the table, considering all the values appearing in Var1, Var2 and Var3. Meaning that the desired output would be:

Event Id  x  y  z  w  q
1      a  1  1  0  1  0
2      a  0  1  1  1  0
3      b  1  1  0  0  1

Meaning that I keep the same number of rows of the initial dataset, I only add for each row a number of columns equal to all the unique factors among Var 1, Var 2 and Var3.

I tried all aproaches I could think of, but nothing seems to work so far..

Any idea?


Solution

  • You can use data.table -

    dt <- read.table(text="Event Id Var1 Var2 Var3
    1      a   x    w    y
    2      a   z    y    w
    3      b   x    y    q",header=T)
    
    setDT(dt)
    dcast(setDT(melt(dt,id.vars = c("Event","Id")))[,ind:=1],Event+Id~value,value.var = "ind",fill=0)
    

    Output-

       Event Id Var1 Var2 Var3 q w x y z
    1:     1  a    1    1    1 0 1 1 1 0
    2:     2  a    1    1    1 0 1 0 1 1
    3:     3  b    1    1    1 1 0 1 1 0