Search code examples
rdataframedummy-variable

Creating an unorthodox dummy variable


I need to create some unorthodox dummy variables and I am having some trouble. Essentially in my dataset each teacher can teach multiple classes. I'm building a multilevel dataset, so it is ok that there are duplicate teacher IDs.

Here is an example of the data:

#generate data
teacher.id <- c(1:5, 1:5)
class.taught <- c("ELA", "Math", "Science", "ELA", "Math", "Science", "Math", "ELA", "ELA", "Math")

# combine into data frame
dat <- data.frame(teacher.id, class.taught)

As you can see teachers with IDs 1 and 3 both teach 2 different classes.

The conventional approach to creating dummy variables yields:

# example of what I have done so far 
dat$teach.ELA <- ifelse(dat$class.taught == "ELA", 1, 0 )
dat$teach.MATH <- ifelse(dat$class.taught == "Math", 1, 0 )
dat$teach.SCIENCE <- ifelse(dat$class.taught == "Science", 1, 0 )
dat

However, here is how I would like the new dummy variables to look:

desired.ELA <- c(1,0,1,1,0,1,0,1,1,0)
desired.MATH <- c(0,1,0,0,1,0,1,0,0,1)
desired.SCIENCE <- c(1,0,1,0,0,1,0,1,0,0)
dat.2 <- data.frame(dat, desired.ELA, desired.MATH, desired.SCIENCE)
dat.2

My hunch is that I need to loop through the ids to create these, but past that I really don't see my avenue to accomplish what I desire.


Solution

  • Here is a base R method. The idea is that you create the dummies for each teacher and then merge these onto the original data:

    # get dummies for each teacher
    temp <- as.data.frame(with(dat, table(teacher.id, class.taught) > 0))
    temp$teacher.id <- as.integer(row.names(temp))
    
    # merge onto dataset
    merge(dat, temp, by="teacher.id")
    

    You could coerce the logicals to integer if it really bugged you, but R will do all that work for you.