Search code examples
rdataframer-factor

Generate a dichotomous variable from a factor


I have a dataframe with a factor in it, such as:

> var1 <- gl(10, 2, labels=letters[1:10])
> var2 <- c(1:20)
> data <- data.frame(var1=var1,var2=var2)
> data
   var1 var2
1     a    1
2     a    2
3     b    3
4     b    4
5     c    5
6     c    6
7     d    7
...
20    j   20

I'm trying to generate a dichotomous variable defined as 1 and 0 for specific values of var1. However, when I enter the following code:

> data <- data.frame(var1=var1,var2=var2)
> data$var3 <- c(1[which(var1=="a" | var1=="b" | var1=="c" | var1=="d" | 
var1=="e")], 0[which(var1=="f" | var1=="g" | var1=="h" | var1=="i" | var1=="j")])

I get the following:

> data$var3
 [1]  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

The first item is changed to a 1, but the rest become NAs. How can I obtain the results that I want?


Solution

  • I'm not sure that I even understand how you expected your code to function, but it seems like you just want to create a vector of 1 for values from var1 that are either a, b, c, d, or e, and 0 for the rest. If that's the case, then simply use %in%, which will create a logical vector, and wrap that in as.numeric to convert it to 1s and 0s.

    Example:

    data$var3 <- as.numeric(data$var1 %in% c("a", "b", "c", "d", "e"))
    ## Or, shorter:
    ## data$var3 <- as.numeric(data$var1 %in% letters[1:5])
    

    > head(data, 3)
      var1 var2 var3
    1    a    1    1
    2    a    2    1
    3    b    3    1
    > tail(data, 3)
       var1 var2 var3
    18    i   18    0
    19    j   19    0
    20    j   20    0