Search code examples
rloopsvectordataframefill

Filling a dataframe with a 0 and 1 vector


I have a datarame with two columns (A and B). Column A is categorical B is numeric (ranging from 0.0 to 1.0). I want to create a column C for which the values are 1 when the value in Column B is greater than or equal to 0.5 and 0 when the value in column B is less than 0.5. Any suggestions on how to do this? The final df should look like this:

A = c('spA', 'spB', 'spC', 'spD') 
B = c(0.25, 0.15, 0.50, 0.75) 
C = c(0,0,1,1) 
df = data.frame(A, B, C)

Solution

  • Just use

    A = c('spA', 'spB', 'spC', 'spD')  
    B = c(0.25, 0.15, 0.50, 0.75)  
    df = data.frame(A, B)
    
    df$C <- as.numeric(df$B >= 0.5)
    

    @David Arenburg: Speed comparison of all 3 solutions pointed our above
    To be honest i dont know why it is that much faster.

    require(microbenchmark)
    microbenchmark(
      df$C <- ifelse(df$B>=0.5, 1, 0),
      transform(df, C = as.numeric(B >= 0.5)),
      df$C <- as.numeric(df$B>=0.5)
      )
    

    Result:

    Unit: microseconds
                                        expr     min       lq   median       uq    max neval
           df$C <- ifelse(df$B >= 0.5, 1, 0)  33.585  35.7580  38.1285  41.6845 140.66   100
     transform(df, C = as.numeric(B >= 0.5)) 143.821 149.7470 155.0815 164.5640 284.48   100
             df$C <- as.numeric(df$B >= 0.5)  20.546  22.9165  24.2995  27.2630  53.34   100
    

    EDIT: Lager Dataset

    df <- data.frame(B=runif(100000))
    
    require(microbenchmark)
    microbenchmark(
      df$C <- ifelse(df$B>=0.5, 1, 0),
      transform(df, C = as.numeric(B >= 0.5)),
      df$C <- as.numeric(df$B>=0.5)
      )
    
    Unit: microseconds
                                        expr       min        lq     median         uq       max neval
           df$C <- ifelse(df$B >= 0.5, 1, 0) 31620.826 33623.452 34529.8380 55652.9290 62707.064   100
     transform(df, C = as.numeric(B >= 0.5))   811.561   979.286  1032.6255  1248.5550  2333.137   100
             df$C <- as.numeric(df$B >= 0.5)   606.498   764.542   808.0045   979.0875 23805.112   100