Search code examples
rdummy-variable

Simple way of creating dummy variable in R


I want to know how simply a dummy variables can be created. I found many similar questions on the dummy but either they are based on some external packages or technical.

I have data like this :

df <- data.frame(X=rnorm(10,0,1), Y=rnorm(10,0,1))
df$Z <- c(NA, diff(df$X)*diff(df$Y))

Z create a new variable within df ie product of change in X and change in Y. Now I want to create a dummy variable D in df such that if : Z < 0 then D==1, if Z >0 then D==0.

I tried in this way :

df$D <- NA
for(i in 2:10) {
if(df$Z[i] <0 ) {
D[i] ==1
}
if(df$Z[i] >0 ) {
D[i] ==0
}}

This is not working. I want to know why above code is not working (with easy way of doing this) and how dummy variables can be creating in R without using any external packages with little bit of explanation.


Solution

  • We can create a logical vector by df$Z < 0 and then coerce it to binary by wrapping with +.

     df$D <- +(df$Z <0)
    

    Or as @BenBolker mentioned, the canonical options would be

    as.numeric(df$Z < 0)
    

    or

    as.integer(df$Z < 0)
    

    Benchmarks

    set.seed(42)
    Z <- rnorm(1e7)
    library(microbenchmark)
    microbenchmark(akrun= +(Z < 0), etienne = ifelse(Z < 0, 1, 0),
               times= 20L,  unit='relative')
    #    Unit: relative
    #    expr      min       lq     mean   median      uq      max neval
    #   akrun  1.00000  1.00000 1.000000  1.00000 1.00000 1.000000    20
    # etienne 12.20975 10.36044 9.926074 10.66976 9.32328 7.830117    20