I need to create some unorthodox dummy variables and I am having some trouble. Essentially in my dataset each teacher can teach multiple classes. I'm building a multilevel dataset, so it is ok that there are duplicate teacher IDs.
Here is an example of the data:
#generate data
teacher.id <- c(1:5, 1:5)
class.taught <- c("ELA", "Math", "Science", "ELA", "Math", "Science", "Math", "ELA", "ELA", "Math")
# combine into data frame
dat <- data.frame(teacher.id, class.taught)
As you can see teachers with IDs 1 and 3 both teach 2 different classes.
The conventional approach to creating dummy variables yields:
# example of what I have done so far
dat$teach.ELA <- ifelse(dat$class.taught == "ELA", 1, 0 )
dat$teach.MATH <- ifelse(dat$class.taught == "Math", 1, 0 )
dat$teach.SCIENCE <- ifelse(dat$class.taught == "Science", 1, 0 )
dat
However, here is how I would like the new dummy variables to look:
desired.ELA <- c(1,0,1,1,0,1,0,1,1,0)
desired.MATH <- c(0,1,0,0,1,0,1,0,0,1)
desired.SCIENCE <- c(1,0,1,0,0,1,0,1,0,0)
dat.2 <- data.frame(dat, desired.ELA, desired.MATH, desired.SCIENCE)
dat.2
My hunch is that I need to loop through the ids to create these, but past that I really don't see my avenue to accomplish what I desire.
Here is a base R method. The idea is that you create the dummies for each teacher and then merge these onto the original data:
# get dummies for each teacher
temp <- as.data.frame(with(dat, table(teacher.id, class.taught) > 0))
temp$teacher.id <- as.integer(row.names(temp))
# merge onto dataset
merge(dat, temp, by="teacher.id")
You could coerce the logicals to integer if it really bugged you, but R will do all that work for you.