I want to know how simply a dummy variables can be created. I found many similar questions on the dummy but either they are based on some external packages or technical.
I have data like this :
df <- data.frame(X=rnorm(10,0,1), Y=rnorm(10,0,1))
df$Z <- c(NA, diff(df$X)*diff(df$Y))
Z create a new variable within df ie product of change in X and change in Y. Now I want to create a dummy variable D in df such that if : Z < 0 then D==1, if Z >0 then D==0.
I tried in this way :
df$D <- NA
for(i in 2:10) {
if(df$Z[i] <0 ) {
D[i] ==1
}
if(df$Z[i] >0 ) {
D[i] ==0
}}
This is not working. I want to know why above code is not working (with easy way of doing this) and how dummy variables can be creating in R without using any external packages with little bit of explanation.
We can create a logical vector by df$Z < 0
and then coerce it to binary by wrapping with +
.
df$D <- +(df$Z <0)
Or as @BenBolker mentioned, the canonical options would be
as.numeric(df$Z < 0)
or
as.integer(df$Z < 0)
set.seed(42)
Z <- rnorm(1e7)
library(microbenchmark)
microbenchmark(akrun= +(Z < 0), etienne = ifelse(Z < 0, 1, 0),
times= 20L, unit='relative')
# Unit: relative
# expr min lq mean median uq max neval
# akrun 1.00000 1.00000 1.000000 1.00000 1.00000 1.000000 20
# etienne 12.20975 10.36044 9.926074 10.66976 9.32328 7.830117 20