My data is in the shape
Event Id Var1 Var2 Var3
1 a x w y
2 a z y w
3 b x y q
and I need to create multi-hot encoded vectors for each row in the table, considering all the values appearing in Var1, Var2 and Var3. Meaning that the desired output would be:
Event Id x y z w q
1 a 1 1 0 1 0
2 a 0 1 1 1 0
3 b 1 1 0 0 1
Meaning that I keep the same number of rows of the initial dataset, I only add for each row a number of columns equal to all the unique factors among Var 1, Var 2 and Var3.
I tried all aproaches I could think of, but nothing seems to work so far..
Any idea?
You can use data.table
-
dt <- read.table(text="Event Id Var1 Var2 Var3
1 a x w y
2 a z y w
3 b x y q",header=T)
setDT(dt)
dcast(setDT(melt(dt,id.vars = c("Event","Id")))[,ind:=1],Event+Id~value,value.var = "ind",fill=0)
Output-
Event Id Var1 Var2 Var3 q w x y z
1: 1 a 1 1 1 0 1 1 1 0
2: 2 a 1 1 1 0 1 0 1 1
3: 3 b 1 1 1 1 0 1 1 0