I have a datarame with two columns (A and B). Column A is categorical B is numeric (ranging from 0.0 to 1.0). I want to create a column C for which the values are 1 when the value in Column B is greater than or equal to 0.5 and 0 when the value in column B is less than 0.5. Any suggestions on how to do this? The final df should look like this:
A = c('spA', 'spB', 'spC', 'spD')
B = c(0.25, 0.15, 0.50, 0.75)
C = c(0,0,1,1)
df = data.frame(A, B, C)
Just use
A = c('spA', 'spB', 'spC', 'spD')
B = c(0.25, 0.15, 0.50, 0.75)
df = data.frame(A, B)
df$C <- as.numeric(df$B >= 0.5)
@David Arenburg: Speed comparison of all 3 solutions pointed our above
To be honest i dont know why it is that much faster.
require(microbenchmark)
microbenchmark(
df$C <- ifelse(df$B>=0.5, 1, 0),
transform(df, C = as.numeric(B >= 0.5)),
df$C <- as.numeric(df$B>=0.5)
)
Result:
Unit: microseconds
expr min lq median uq max neval
df$C <- ifelse(df$B >= 0.5, 1, 0) 33.585 35.7580 38.1285 41.6845 140.66 100
transform(df, C = as.numeric(B >= 0.5)) 143.821 149.7470 155.0815 164.5640 284.48 100
df$C <- as.numeric(df$B >= 0.5) 20.546 22.9165 24.2995 27.2630 53.34 100
EDIT: Lager Dataset
df <- data.frame(B=runif(100000))
require(microbenchmark)
microbenchmark(
df$C <- ifelse(df$B>=0.5, 1, 0),
transform(df, C = as.numeric(B >= 0.5)),
df$C <- as.numeric(df$B>=0.5)
)
Unit: microseconds
expr min lq median uq max neval
df$C <- ifelse(df$B >= 0.5, 1, 0) 31620.826 33623.452 34529.8380 55652.9290 62707.064 100
transform(df, C = as.numeric(B >= 0.5)) 811.561 979.286 1032.6255 1248.5550 2333.137 100
df$C <- as.numeric(df$B >= 0.5) 606.498 764.542 808.0045 979.0875 23805.112 100