I have a number of data frames that contain a factor that I wish to expand out into a number of binary equivalents (one hot encoding). However, in each data frame not all the possible factors are present, but I do know what all the possible factors are (there are 70 such factors). I want to add all the possible binary dummies to every data frame.
From the code below, I can create the dummies within each data frame, but not all the possible dummies. For example, set1.df does not have any person in category "E" or "F", whilst set2.df does not have anyone in category "D". What's needed is additional columns set1.dfE set1.dfF in set1.df that are all 0, and column set2.dfD in set2.df that is all zeros. I can not rbind set1.df and set2.df before creating the dummies because I need to do some processing of each data frame using the binary variables before rbinding. Just to re-iterate I know what levels are possible in my data before hand, eg "A" to "F".
library(dummies)
person_id <- c(1,2,3,4,5,6,7,8,9,10)
person_cat <- c("A","B","C","A","B","C","D","A","A","A")
set1.df <- data.frame(person_id,person_cat)
person_id <- c(11,12,13,14,15,16,17,18,19,20)
person_cat <- c("A","B","C","A","B","C","E","E","F","A")
set2.df <- data.frame(person_id,person_cat)
dummies1 <- dummy(set1.df[,2])
dummies2 <- dummy(set2.df[,2])
dummies1
dummies2
The expected output is:
> dummies1
set1.dfA set1.dfB set1.dfC set1.dfD set1.dfE set1.dfF
[1,] 1 0 0 0 0 0
[2,] 0 1 0 0 0 0
[3,] 0 0 1 0 0 0
[4,] 1 0 0 0 0 0
[5,] 0 1 0 0 0 0
[6,] 0 0 1 0 0 0
[7,] 0 0 0 1 0 0
[8,] 1 0 0 0 0 0
[9,] 1 0 0 0 0 0
[10,] 1 0 0 0 0 0
> dummies2
set2.dfA set2.dfB set2.dfC set2.df$D set2.dfE set2.dfF
[1,] 1 0 0 0 0 0
[2,] 0 1 0 0 0 0
[3,] 0 0 1 0 0 0
[4,] 1 0 0 0 0 0
[5,] 0 1 0 0 0 0
[6,] 0 0 1 0 0 0
[7,] 0 0 0 0 1 0
[8,] 0 0 0 0 1 0
[9,] 0 0 0 0 0 1
[10,] 1 0 0 0 0 0
library(dummies)
person_id <- c(1,2,3,4,5,6,7,8,9,10)
person_cat <- c("A","B","C","A","B","C","D","A","A","A")
person_cat < -factor(person_cat,levels=c("A","B","C","D","E","F"))
set1.df <- data.frame(person_id,person_cat)
person_id <- c(11,12,13,14,15,16,17,18,19,20)
person_cat <- c("A","B","C","A","B","C","E","E","F","A")
person_cat <- factor(person_cat,levels=c("A","B","C","D","E","F"))
set2.df <- data.frame(person_id,person_cat)
dummies1 <- dummy(set1.df[,2],drop=FALSE)
dummies2 <- dummy(set2.df[,2],drop=FALSE)
dummies1
dummies2