Search code examples
rloopsvectorizationassigndummy-variable

R: Creating dummy variables for values of one variable conditional on another variable


ORIGINAL QUESTION

I want to add a series of dummy variables in a data frame for each value of x in that data frame but containing an NA if another variable is NA. For example, suppose I have the below data frame:

x <- seq(1:5)
y <- c(NA, 1, NA, 0, NA)
z <- data.frame(x, y)

I am looking to produce:

  • var1 such that: z$var1 == 1 if x == 1, else if y == NA, z$var1 == NA, else z$var1 == 0.
  • var2 such that: z$var2 == 1 if x == 2, else if y == NA, z$var2 == NA, else z$var2 == 0.
  • var3 etc.

I can't seem to figure out how to vectorize this. I am looking for a solution that can be used for a large count of values of x.

UPDATE

There was some confusion that I wanted to iterate through each index of x. I am not looking for this, but rather for a solution that creates a variable for each unique value of x. When taking the below data as an input:

x <- c(1,1,2,3,9)
y <- c(NA, 1, NA, 0, NA)
z <- data.frame(x, y)

I am looking for z$var1, z$var2, z$var3, z$var9 where z$var1 <- c(1, 1, NA, 0, NA) and z$var2 <- c(NA, 0, 1, 0, NA). The original solution produces z$var1 <- z$var2 <- c(1,1,NA,0,NA).


Solution

  • You can use the ifelse which is vectorized to construct the variables:

    cbind(z, setNames(data.frame(sapply(unique(x), function(i) ifelse(x == i, 1, ifelse(is.na(y), NA, 0)))), 
                      paste("var", unique(x), sep = "")))
    
      x  y var1 var2 var3 var9
    1 1 NA    1   NA   NA   NA
    2 1  1    1    0    0    0
    3 2 NA   NA    1   NA   NA
    4 3  0    0    0    1    0
    5 9 NA   NA   NA   NA    1
    

    Update:

    cbind(z, data.frame(sapply(unique(x), function(i) ifelse(x == i, 1, ifelse(is.na(y), NA, 0)))))
      x  y X1 X2 X3 X4
    1 1 NA  1 NA NA NA
    2 1  1  1  0  0  0
    3 2 NA NA  1 NA NA
    4 3  0  0  0  1  0
    5 9 NA NA NA NA  1